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Foreword 



By the dawn of the new millennium, robotics has undergone a major transforma- 
tion in scope and dimensions. This expansion has been brought about by the ma- 
turity of the field and the advances in its related technologies. From a largely 
dominant industrial focus, robotics has been rapidly expanding into the challenges 
of the human world (human-centered and life-like robotics). The new generation 
of robots is expected to safely and dependably interact and work with humans in 
homes, workplaces, and communities providing support in services, entertainment, 
education, exploration, healthcare, manufacturing, and assistance. 

Beyond its impact on physical robots, the body of knowledge that robotics has 
produced is revealing a much wider range of applications reaching across diverse 
research areas and scientific disciplines, such as: biomechanics, haptics, neurosci- 
ences, and virtual simulation, animation, surgery, and sensor networks among oth- 
ers. In return, the challenges of the new emerging areas are proving an abundant 
source of stimulation and insights for the field of robotics. It is indeed at the inter- 
section of disciplines where the most striking advances happen. 

The Springer Tracts in Advanced Robotics (STAR) is devoted to bringing to 
the research community the latest advances in the robotics field on the basis of 
their significance and quality. Through a wide and timely dissemination of critical 
research developments in robotics, our objective with this series is to promote 
more exchanges and collaborations among the researchers in the community and 
contribute to further advancements in this rapidly growing field. 

Since its inception in 1994, the biennial Workshop Algorithmic Foundations of 
Robotics (WAFR) has established some of the field's most fundamental and lasting 
contributions. Since the launching of STAR, WAFR and several other thematic 
symposia in robotics find an important platform for closer links and extended 
reach within the robotics community. 

This volume is the outcome of the WAFR ninth edition and is edited by D. Hsu, 
V. Isler, J.-C. Latombe and M.C. Lin. The book offers a collection of a wide range 
of topics in advanced robotics, including motion planning, multiagents, modular and 
reconfigurable robots, localization and mapping, grasping, and sensing. 

The contents of the twenty-four contributions represent a cross-section of the 
current state of research from one particular aspect: algorithms, and how they are 
inspired by classical disciplines, such as discrete and computational geometry, 
differential geometry, mechanics, optimization, operations research, computer 



VIII Foreword 

science, probability and statistics, and information theory. Validation of algo- 
rithms, design concepts, or techniques is the common thread running through this 
focused collection. 

Rich by topics and authoritative contributors, WAFR culminates with this 
unique reference on the current developments and new directions in the field of 
algorithmic foundations. A very fine addition to the series! 



Naples, Italy Bruno Siciliano 

October 2010 STAR Editor 



Preface 



Robot algorithms are a fundamental build block of robotic systems. They enable 
robots to perceive, plan, control, and learn, in order to achieve greater autonomy. 
Today, the design and analysis of robot algorithms are more crucial than ever for 
at least two reasons: 

• Robotics is undergoing major transformation. Originally focused on indus- 
trial manufacturing, it is now rapidly expanding into new domains, such as 
homes and offices, elderly care, medical surgery, entertainment, ocean and 
space exploration, and search-and-rescue missions. In these new domains, 
tasks are less repetitive, environments are less structured, events are more 
unpredictable, and greater autonomy is required over long periods of time. 
It is often impossible to anticipate all events explicitly and to program the 
robots specifically to handle them. New algorithms that are adaptive to 
environment uncertainties and changes are needed to conquer these 
challenges. 

• Robot algorithms are finding new applications beyond robotics, for exam- 
ple, in designing mechanical assemblies, modeling molecular motion, creat- 
ing digital characters for video games and computer-generated movies, ar- 
chitectural simulation, and ergonomic studies. These non-traditional 
applications of robot algorithms pose new challenges: hundreds or thou- 
sands of degrees of freedom, large crowds of characters, complex physical 
constraints, and natural-looking motions. The resulting new algorithms may 
in turn benefit future robots. 

Robot algorithms are also rapidly evolving as a result of new technologies, e.g., 
low-cost parallel computers, cheaper and more diverse sensors, and new interac- 
tion technologies ranging from haptic to neuroprosthetic devices. 

Unlike traditional computer algorithms, robot algorithms interact with the 
physical world. They must operate safely, reliably, and efficiently under tight time 
constraints in imperfectly known environments. So, it is not surprising that the de- 
sign and analysis of robot algorithms raise unique combinations of fundamental 
questions in computer science, electrical engineering, mechanical engineering, and 
mathematics. For example, minimalist robotics studies the minimal sensing and 
actuation capabilities required for robots to complete a given task. It addresses not 
only computational complexity issues, but also "physical" complexity issues. 
Probabilistic methods are widely used as a modeling tool to handle uncertainties 
due to sensing and actuation noise, but they are also used as a computational tool 
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that avoids costly computations by extracting partial information and handling the 
resulting uncertainties. Many other such examples abound. 

The Workshop on Algorithmic Foundations of Robotics (WAFR) is a highly 
selective single-track meeting of leading researchers in the field of robot algo- 
rithms. Since its creation in 1994, WAFR has been held every two year and has 
published some of the field's most important and lasting contributions. 

The ninth WAFR was held on December 13-15, 2010 at the National Univer- 
sity of Singapore. It had a strong program of 24 contributed papers selected from 
62 submissions. Each paper was rigorously reviewed by at least three reviewers 
with additional input from two program committee members. The workshop also 
featured 6 invited speakers: Leonidas Guibas (Stanford University), Leslie 
Kaelbling (Massachusetts Institute of Technology), Jean-Pierre Merlet (INRIA 
Sophia-Antipolis), Jose del Millan (Ecole Polytechnique Federale de Lausanne), 
Yoshihiko Nakamura (University of Tokyo), Moshe Shoham (Technion - Israel 
Institute of Technology). A vibrant poster and video session was a new addition 
to this WAFR program to encourage open exchange of ideas in an informal at- 
mosphere. 

In addition to the editors of volume, the program committee consists of Srini- 
vas Akella (University of North Carolina at Charlotte), Dan Halperin (Tel Aviv 
University), Seth Hutchinson (University of Illinois at Urbana-Champaign), Vijay 
Kumar (University of Pennsylvania), Jean-Paul Laumond (LAAS-CNRS), 
Stephane Redon (INRIA Grenoble - Rhone-Alpes), Daniela Rus (Massachusetts 
Institute of Technology ), Katsu Yamane (Disney Research and Carnegie Mellon 
University). 

It was a real pleasure to organize the workshop and to work with such a tal- 
ented group of people. We owe many thanks to all the authors for submitting their 
exciting work, to the program committee members and reviewers for their dedica- 
tion to ensure the finest quality of WAFR, to the speakers for inspiring thoughts 
and ideas, and to all the participants for making the workshop a great success. We 
also wish to thank the following organizations for their generous financial support 
of this WAFR: 

• National University of Singapore, School of Computing 

• United States Air Force Office of Scientific Research, Asian Office of 
Scientific Research & Development. 

Finally we wish to thank the School of Computing at National University of 
Singapore for providing the logistic and technical support necessary to make this 
WAFR successful. 
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Homotopic Path Planning on Manifolds 
for Cabled Mobile Robots 

Takeo Igarashi and Mike Stilman 



Abstract. We present two path planning algorithms for mobile robots that are con- 
nected by cable to a fixed base. Our algorithms efficiently compute the shortest path 
and control strategy that lead the robot to the target location considering cable length 
and obstacle interactions. First, we focus on cable-obstacle collisions. We introduce 
and formally analyze algorithms that build and search an overlapped configuration 
space manifold. Next, we present an extension that considers cable-robot collisions. 
All algorithms are experimentally validated using a real robot. 

1 Introduction 

Mobile robots are typically untethered. This is not always desirable in household 
and high-power robotics. Wireless communication can be unreliable and batteries 
need to be charged regularly. These challenges can be solved by using cables for 
communication and power. Currently, cables are a viable option for robots that work 
in fixed environments such as homes and offices. The challenge addressed in this 
paper is that cables impose additional constraints on robot motion. First, robots 
cannot go further than the cable length. Second, they are blocked by the cable itself 
when the robots are not capable of crossing it. We present two practical planning 
algorithms that handle these constraints and validate them on a real robot system. 

The first challenge is that a cabled robot's movement is constrained by the length 
of the cable. If there is no obstacle, the robot's motion is limited to stay within a cir- 
cle around the fixed end-point of the cable. If there is an obstacle in the environment, 
the robot's movement is further constrained by the interaction between the cable and 
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Fig. 1 The problem domain and a challenging example where the robot must untangle its 
cable. 

the obstacle as shown in Fig.[Tfa) and (b). The locations of the robots are the same, 
but the shortest paths to the goal are different because the cable configuration in 
Fig. [Hb) cannot stretch to the goal. The second problem is that the robot's move- 
ment may be blocked by the cable when the robot is not capable of crossing it as 
shown in Fig. [He). The robot must make an auxiliary motion to move the blocking 
cable out of the way (Fig. [TJd)). This is difficult because the robot cannot directly 
control the cable. It must indirectly control it by pulling. 

In order to address these challenges. Section |3] introduces the overlapped man- 
ifold representation for the configuration space of cabled robots. We develop an 
efficient, resolution complete and optimal algorithm that constructs the manifold 
and solves practical planning problems. To handle collisions between the cable and 
the robot. Section |4] presents a second search method that applies physics-based 
simulation combined with heuristics to choose intermediate subgoals that maximize 
robot mobility. Section[5]experimentally demonstrates that both algorithms generate 
appropriate paths for a real robot that reaches targets in the presence of a cable. 



2 Related Work 

The topic presented in this paper is far more complex than general path planning 
|1]J,|2[]. While the robot itself only operates in two dimensions, the cable is also part 
of the complete system or plant. By including the cable, the challenge is lifted to 
planning for an infinite-dimensional underactuated system. Previous work on teth- 
ered robots [3] treated the problem as multi-robot scheduling. Our approach focuses 
on a single agent and handles environment geometry. 

Considerable research on high degree-of- freedom (DOF) robot systems such as 
ly, [so has direct applications to domains with dozens of DOF and non-holonomic 
constraints. Our problem, however, requires handling even higher DOF and under- 
actuation. Hence, the challenge is also distinct from deformable motion planning as 
presented in JaLZl]. Likewise, cable-routing |8] assumes that shape of the cable is 
directly controllable. However, a cabled robot cannot control all of its degrees of 
freedom and must rely on predictions of cable motion due to stretch. 

Existing planning methods for underactuated deformable objects typically fo- 
cus on local deformations 191 llOll . Studies on deformable needle steering also 
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consider the path to a robot configuration lllUll2|] with a focus on local environment 



deformation and curvature constraints. Our work complements these studies since 
our task is to determine globally optimal robot paths. Global constraints are im- 
posed by the cable length, wrapping around obstacles and potentially colliding with 
the robot. 

Typically, globally constrained underactuated planning and control is restricted to 



four DOF systems as shown in 113Lll4i ll51. To handle the global problem complex- 
ity our domain requires a di ffere nt approach based on topological path homotopy. 
Existing work in knot-tying lll6n plans with distinct topological states. However it 
explicitly encodes and plans rope overlaps. Other planners that distinguish homo- 



topic paths, II17L 1181 119,1. typically operate in a standard high-DOF configuration 
space. Instead, we build a configuration space manifold that implicitly encodes the 
homotopy of cable configuration and then search for shortest paths on the manifold. 
In direct homotopic planning, 1.20.1 studies shortest paths but restricts the domain 
to a boundary-triangulated space. Kill requires semi-algebraic models of obstacles. 



1 2211 gives a configuration space representation that closely related to our work. 
Their complex-plane mapping of paths may increase the efficiency of our methods 
for single-query search. In contrast to our proposed manifold, existing techniques 
do not address global cable-length constraints or cable-robot interactions. 

Existing methods for manifold construction tend to focus on relationships in 



recorded data II23L 12411 . We present a novel, simple algorithm for global path plan- 
ning with distance constraints on paths. The algorithm not only generates paths, but 
a complete vector field Il25..26:1 for robot motion on a manifold of homotopic paths. 



Our extensions to this algorithm also consider cable dynamics 1127112811 and evaluate 



strategies for robot motion when the cable itself is an obstacle in the space \2l 



3 Distance Manifolds: Cable-Obstacle Interaction 

We present a path planner for cabled mobile robots. The initial configuration, q,, 
includes initial cable displacement. The goal is any configuration qg that places the 
robot at pg in a 2D environment. The robot is connected to a fixed base location, 
po, by a cable resting on the floor. The cable is a flexible, passive entity whose 
shape is determined solely by the previous motions of the robot. The environment 
contains fixed obstacles that restrict both cable and robot motion. For simplicity, we 
assume a disk-shaped robot with a given diameter and a cable attached by a freely 
rotating joint. Furthermore, we represent space by a grid where configuration space 
obstacles must occupy at least one grid vertex. This section introduces an algorithm 
that handles the constraint given by cable length. 

First, we build the configuration space that represents the structure of the prob- 
lem and then compute a vector field that guides the robot in the configuration space. 
Given a static environment the configuration space is generated off-line, reducing 
the cost of online planning. Sections [3. 11 2 describe the space, its graph representa- 
tion and formalize the problem statement. Sections 13.31 4 introduce the algorithms 
for graph construction and planning. Section l375] proves algorithm correctness. 
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(a) Regular Space (b) Stitched Partial Spaces (c) 3D Overlap 

Fig. 2 Simple and overlapped manifold configuration space for tethered robots. 




Fig. 3 Example traversal of configuration space. 

3.1 Configuration Space 

In order to build a complete planner for tethered robots, we consider the configura- 
tion space. Notice that the space must distinguish distinct homotopic paths. Some 
configurations that have identical robot locations have different cable configura- 
tions. If we ignore collisions between the cable and obstacles, the configuration 
space is a 2D circular region defined by the 2D environment (Fig. [2^). However, 
this representation cannot distinguish configurations with different cable positions 
(Fig.[3]A,E). The cable location determines the region of space that is immediately 
reachable by the robot. In order to differentiate between A and E, we define con- 
figuration space by an overlapped manifold. The manifold is planar, but it can be 
visualized with stitched or overlapped free space components (Fig. IS?, c). 

Distinct configurations on the manifold with the same locations represent distinct 
cable configurations. A continuous region in the manifold corresponds to a set of 
configurations that can be reached by continuous robot motion. In Fig. [3] straight 
trajectories change the configuration from A to E via B, C, D. However, there is no 
straight path that can displace the robot from A to D or B to E. 

Notice that the number of overlaps in the manifold increases exponentially with 
each additional obstacle. The robot has two options for circumnavigating each ob- 
stacle. It can go around to the left or to the right. Hence, for n reachable obstacles, 
there exist at least 2" paths or cable routes that reach the same goal. This corre- 
sponds to at least 2" possible overlaps in the configuration space manifold. We say 
"at least" because winding the cable around an obstacle also doubles the overlaps. 
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3.2 Graph Representation of Configuration Space 

This section gives a formal representation of the problem domain and the problem 
statement. Our algorithm constructs a graph G = (V, E) that represents the config- 
uration space manifold (Section [33]) and then searches the graph (Section [3. 41 ) for 
vector fields or paths. The key challenge is to construct a graph that completely en- 
codes the configuration space. This section defines the graph properties that must be 
assured in graph construction. Section [331 verifies these properties. 

First, consider the domain definitions: Manifold vertices are located at physical 
grid nodes, vq is the vertex that represents the cable base. D/j is the manhattan 
distance on the grid between v,- and Vj. When Dij — 1, the two vertices are referred 
to as neighbors. Notice that neighboring vertices are not necessarily connected by an 
edge since they may be on distinct overlapping folds of the manifold. P,, represents 
paths between vertices on the manifold and |P,,| is path length. 
The problem is to build a graph G such that any shortest path from v,- to Vg corre- 
sponds to a shortest path from qi to any qg where pg is the target and the cable does 
not cross an obstacle. 

Definition 1. Two paths from vq to any point are path homotopic if and only if there 
exists no obstacle in the area enclosed by the two paths. Otherwise they are not 
homotopic or ahomotopic. 

While grid nodes are simply positions, p,-, manifold vertices, v,- are defined by the 
set of homotopic paths from vq to p,-. Each vertex is associated with a path of adja- 
cent vertices of length less than D„ax- In every set of homotopic paths, there exist 
minimal paths. Let us call them m-paths (mP{vo,Vi) or mPi). 

Definition 2. V2 is an m-child of vj (v2 >~ vi) and vi is an m-parent of V2 (vi -< vi) 
if and only if there exists a minimal path, mP2, where vi is the last vertex before V2. 

Definition 3. vi, V2 are m-adjacent (vi ~ V2) if and only if vi is an m-parent of V2 
or V2 is an m-parent of vi . 

Definition 4. Collocated vertices vi and V2 are manifold-equivalent, m-equivalent 
(vi = V2) if and only if every path to vi is homotopic to every path to V2. 

All m-equivalent vertices are collocated, vi = V2 => vi ~ v2. However, not all col- 
located vertices are m-equivalent. This occurs when the paths to vi go around some 
obstacle while those to V2 do not. In this case, the distance between vertices on the 
manifold can be greater than physical distance between their positions. 

Definition 5. The m-distance between vi and V2, mD\2 is the length of any shortest 
path between vi and V2 such that all consecutive nodes on the path are m-adjacent. 

Lemma 1. Let v\ and V2 be neighboring vertices such that D12 = 1. For any vertex, 
V3, i/-Di3 > D23 then strictly Di^ > Z)23- Likewise, mDij, > mD23 => mDi^ > mD23. 

Proof. Any paths /"n and P23 must have distinct parity since they are separated by 
one edge tSOi]. One path has even length while the other is odd. Hence 

lAsI^ 1^231- □ 
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Proposition 1 (Adjacency). For neighboring vertices: vi '^ vj (a) if and (b) only if 
there is no obstacle in the area enclosed by any mP\ and mP2. 

Proof. Since vi and V2 are neighbors, mDoi ^ mDo2 by Lemma [l] Without loss 
of generality, consider mDoi < mDoi- (a) For any path mPi, construct path Pi = 
{mPi , V2} . This is a minimal path to V2 with vi as the last vertex since mDoi < mDo2 
and \P2\ = \mPi | + 1. Hence, vi ~< V2 and vi ^ V2- (b) By contradiction: assume there 
exists an obstacle enclosed by some mPi , mP2. By the premise, vj ^ V2 and therefore 
vi ^ V2. Hence by Def. [21 there exists m/2 with vi as the last vertex and mPi as a 
sub-path. mPj and mP2 enclose an obstacle so they are not homotopic. Thus V2 ^ V2- 
Contradiction. Likewise if wDoi > mDo2- O 

Consider again the problem statement: Build a graph G such that any shortest path 
from Vi to Vg corresponds to a shortest path from qi to any qg s.t. the cable does 
not cross an obstacle. Following Proposition [TJ this graph must have the following 
property: two neighboring vertices are connected by an edge if and only if they are 
m-adjacent. Section [331 introduces our algorithm for constructing G. 



3.3 Manifold Construction: Forward Search 

Our algorithm in Fig. |4] incrementally adds vertices and builds graph edges by con- 
necting adjacent vertices to the north, south, east and west of each vertex. Suc- 
CESSORS(va, V,E) returns the set of neighboring, collision-free vertices that are not 
yet in the graph. Since we assume that obstacles occupy at least one grid node, 
CollisionFree(p) returns true when a node does not intersect an obstacle. 

Multiple manifold vertices can share a single grid position as in Fig.[3](A,E). Our 
algorithm, distinguishes grid positions, p,, from manifold vertices, v,. The position 
of vertex v, is obtained by Pos[v,]. The function NEIGHBORS (v,) returns the set of 
four neighboring positions of the vertex. The function Adjacent(v,) returns the set 
of all vertices in V that are adjacent/connected to v, inGas follows: {vj\3e{vi,Vj) £ 
E}. There are at most four such vertices. Likewise, Adjacent(Adjacent(v,)) 
returns at most eight vertices ^ v,- that are adjacent to the first four. 

BuildManifold is a variant of breadth-first search or wavefront expansion. 
Standard expansion adds all edges to existing neighbors when adding a new vertex. 
In contrast, we add an edge to a neighbor only when there is a common vertex that 
has edges to both the neighbor and the parent of the new vertex (Lines 10-13). This 
is illustrated by Fig. \5\ An edge is added between v, and Vb since both Va and Vh 
have edges to a common vertex Vc. However, an edge is not added between V; and 
Vd- Likewise, in Fig.|6l no edge is added between v, and V/i, generating the manifold 
with overlaps as presented in Section [3711 



3.4 Plan Generation: Backward Search 

Given the graph representing the configuration space, we construct a vector field to 
guide the robot from any given starting location to any desired goal. This is required 
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BuildManifold(vo,V,E) 

1 Q ^ Enqueue(vo) 

2 while Q / 

3 do Vfl ^ Dequeue(Q) 

4 S^ SUCCESSORS(va,V,E) 

5 for all V, e S 

6 do V^ 1nsert(v,) 

7 E ^ lNSERT(v,,Vfl) 

8 if Dist[v,] < DistMax 

9 then Q ^ Enqueue(v,) 

10 B ^ ADJACENT(ADJACENT(va)) 

11 forallv/,eB 

12 doif POS[vi] eNEIGHBORS(v,) 

13 thenE^ 1nsert(v,,v/,) 

SUCCESSORS(Vfl,V,E) 

1 N^NEIGHBORS(Vfl) 

2 S^0 

3 for all Pi e N 

4 do if CollisionFree(p,) and 

5 p; ^ POS[ADJACENT(va)] 

6 then v; ^ NewVertex 

7 POS[v;]^p; 

8 DIST[v,-] ^DlST[Vfl] + l 

9 S ^ Insert(v,) 
10 return S 

Fig. 4 Manifold Construction Pseudo-code. 
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Fig. 6 Illustration of overlap 



since the manifold is a roadmap that is created for all possible start and goal states. 
Basic dynamic programming or wavefront expansion is used compute a distance 
field over the configuration space starting from the target location. The gradient of 
the distance field is used to control the robot. Note that the target location can be 
associated with multiple vertices in the graph. Starting from all these vertices, we 
assign minimal distance values to the remaining vertices by breadth-first traversal. 
Consequently, the robot always follows the minimal path on the manifold. 



3.5 Algorithm Analysis 

This section analyzes the complexity, optimality and correctness of our algorithms. 
First of all, the computational complexity of manifold construction is 0{n) where 
n is the number of vertices in the configuration space. Likewise, the computational 
complexity of search is 0{n). Hence the entire algorithm is executed in 0{n). Fur- 
thermore, manifold generation must only be computed once for static environments, 
regardless of start state and goal. This yields efficient multi-query planning. 

Given that G is correctly constructed and completely represents the manifold that 
the robot can traverse then dynamic programming is a complete and optimal method 
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for finding a solution. Therefore plan generation is complete and optimal. The re- 
maining task is to prove the correctness and completeness of manifold construction. 
We will use Proposition|2]in the validation of BuildManifold in Proposition[3l 

Proposition 2 (Equivalence). Ifv\ ~ V2 are both m-adjacent to Va then v\ = V2- 

Proof. Let Pi and P2 be any paths to vi and V2. There exist shortest paths mPi,mP2 
homotopic to Pi, P2 respectively. Let mPa be a shortest path to Va- Since vi ^ Va and 
V2 ^ Va, by Prop.[T] there is no obstacle enclosed by {mPi,mPa} and {mP2,mPa}. 
Hence, there is no obstacle enclosed by mPi and mP2, so they are homotopic. Since 
vi ~ V2 and all paths to vi are homotopic to all paths to V2 we have vi = V2. □ 

Proposition 3. Prior to adding V3 with mD^-i >D< Dmax, BuildManifold main- 
tains the following invariant. Let vertices vi and V2 have tuDqi < D, mDQ2 < D- 

(a) vi e V if and only ifvi is not m-equivalent to any other vertex, V2 G V. 

(b) e{v\,V2) e E if and only ifvi is m-adjacent to V2 fvi ~ V2 j. 

Proof. We proceed by induction. Base case, D — I, there are no edges and the only 
vertex is vq, added in Line 1. The inductive step is split into the following Lemmas. 

For Lemmas [2][5] assume Prop. [3l Prior to adding any V3 such that mDos > D + 1.- 
(BMy and Sy refer to Line Y in BuildManifold and Successors respectively) 



Lemma 2. Ifvi G V then vi is not m-equivalent to any other vertex, V2 G V. 

Proof. BuildManifold adds vi to V by expanding Vc, only if Va has no edge to 
any vertex at its position, p\ (S5). By assumption, v„ is not m-adjacent to any vertex 
at p\. Hence, vi is the only vertex at p\ such that Va <v\. Therefore it is the only 
vertex with a minimal path mPi such that Vc, is the last vertex. D 

Lemma 3. If 3vi (mDoi < D) not m-equivalent to any other vertex in V then vi £ V. 

Proof. By contradiction: Suppose vi ^ V. vi is an m-child of some v„ where mDoa = 
mDoi — 1. By the assumption, mDoa < D so Va E V and by BM9, Va G Q. Since there 
are finite vertices with mDo^ < D, Va is dequeued and expanded in BM4. Since vi 
is a neighbor of v^ one of the following must hold: (1) By S4, vi's position is not 
collision free. Contradiction. (2) By S5 and the inductive assumption, v^ has an 
m-adjacent vertex, V2 ~ vi. By Prop. [21 V2 = vi. Contradiction. D 

Lemma 4. T/'E contains edge (vi , V2) then vi , V2 are m-adjacent. 

Proof. Without loss of generality, suppose vi is added after V2. An edge is added at 
(a)BM7 or (b)BM13. (a) V2 = Va and vi is newly defined and implicitly associated 
with minimal paths homotopic to mPi = {mPa^vi}. Since V2 is the last vertex on 
niP\, V2 ^ vi. (b) V2 — Vh. By BM7 there exists Va ^ vi. By BMIO and the inductive 
assumption there exists Vc such that Va ~ Vc ~ v^. Given that Vb ^ Vc and Vc ^ Va and 
Va ^v\. Prop. [T] States that there is no obstacle between any mP), and mPi as shown 
by regions {be), {ca),{a\) and/? inPig.Oa). Hence, by Prop. 1 V2 = v/, ~ vi. D 
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(a) 



(b) 



(c) 



Fig. 7 Illustrations for the proof of Lemmas I4I5I Straight lines indicate precise grid displace- 
ments. Curves represent paths that preserve relative position but not necessarily distance. 



Lemma 5. Ifvi,V2{mD(ji,mDQ2 < D) are m-adjacent, E contains edge (vj, V2). 

Proof. By the inductive assumption: vi , V2 G V. Without loss of generality, V2 <v\. 

(a) mDoi , mDQ2 < D then (vi , V2) € E by the inductive assumption. 
For the remaining cases otDqi = D and mDo2 = D — 1 by Lemma[T] 

(b) V2 = Va is the first m-parent of vi added to V. Then (vi , V2) € E by BM7. 

For the remaining cases there exists v^, -< vi [va / V2) that was added prior to V2. 
Since vi = vi, there are two minimal paths mPi{a) = {mPa.vi} and mPi{2) = 
{mP2,vi} that enclose a region R with no obstacles. By Prop. |2] there are three 
relative positions for V2 ^ Va- Due to symmetry of Fig.|7jb), that yields two cases. 

(c) In the case of Fig. IT^b) there exists v^, neighbor of v^ and V2 contained in R. 
Extend two straight paths /"^^ and P4C opposite Va and V2 respectively. Since 
R is closed, these paths must intersect mP2 and mPa at some vertices V3, V4 
respectively. Since f3£. is straight, \Pj,c\ < \P32\- Hence, the path S2 — {Poi,Vc,V2} 
has length 1^2 1 < \mP2\- Therefore v^ -< V2. Likewise, since P4C is straight, \P4c\ < 
\P4a\- Hence, Sa = {PQ4,Vc,Va} has length \Sa\ < \niPa\. Therefore Vc <V2- Thus 
there exists Vc such that V2 ~ Vc ~ v^. This satisfies BMlO-13, thus (vi , V2) G E. 

(d) In the case of Fig. |7|c) there exists Vc neighbor of vj that is contained in R. 
Extend a straight path Pt,c from V3 opposite vj. Since R is closed, Pj,c must 
intersect either mP2 or mPa at some vertex V3 respectively. Without loss of gen- 
erality, assume it intersects mP2. Since Pt,^ is straight, \P-ic\ < |^32|- Hence, the 
path S = {Po3,Vc,vi} has length \S2\ < \mP2\- Since 5 is a path from vq to vi 
homotopic to mP\{2), we have niDi < \mP2\ + 1. Thus mPi{2) is not a mini- 
mal path. Contradiction. Likewise, if P^c intersects mPa, we find mPi{a) is not 
a minimal path. Contradiction. 



In all valid cases where vi ^ V2, E contains the edge (vi , V2). 



n 



Lemma 6. The algorithm terminates when all v; : mDj < Z),„a;c (if^ added to Q. 
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Proof. Every new vertex, v,- increments Dist[v,] by 1 from its parent (S8). For any 
D there are a finite number of vertices that are not m-equivalent with niDi < D. 
Since no vertices are added to Q with Dist[v,] > DistMax (BM7) and each step 
dequeues, BuildManifold terminates. D 

By Lemmas |2]|6l BuildManifold is proven to add all the vertices on the man- 
ifold to V and all the edges between m-adjacent vertices to E prior to guaranteed 
termination. Therefore, the manifold generation algorithm is correct and complete. 

3.6 Implementation Details 

The presented algorithm computes Manhattan distance between the base and each 
vertex in the graph. This is a low-order approximation of physical distance. We 
therefore also allow diagonal moves when computing the distance value, creating 
an 8-connected lattice and obtaining a better approximation. The experiments in 
Section[5] demonstrate that it performs well in robot experiments. 

In order to include diagonal moves. Line 7 of SUCCESSORS uses d instead of 1, 
where d — \/2 for diagonally connected vertices. Furthermore, Q in BuildMani- 
fold is a priority queue rather than a FIFO in order to always select vertices with the 
minimal distance from vq. This approach increases computation time to 0{n\ogn) 
due to priority queue operations. 



4 Cable-Robot Interaction 

Section [3] introduced a novel formulation of the configuration space for tethered 
robots and presented a complete solution to path planning for robots that are re- 
stricted by cable length. The proposed configuration space allows us to go further 
and consider additional constraints on robot motion. In this section, we examine the 
case where the robot cannot cross the cable. Cable-robot collisions present further 
algorithmic challenges that are not solved by existing methods. We evaluate two 
solutions and propose a novel algorithm in Section |431 

4.1 Preliminary Algorithm 

Simple domains such as Fig.[8la) can be solved by adding the current cable shape 
as an obstacle to future motion|29]. We implemented an algorithm that incremen- 
tally removed vertices from the graph that were within the robot radius of the cable 
through wavefront expansion. The online controller continuously updated the vector 
field as the cable shape changed during motion. 

In our experiments, the initial path typically remained valid during robot motion 
because the deformation of the cable occurred behind the robot. When the plan be- 
came inaccessible, the system replanned the path. This approach required continu- 
ous tracking of cable shape. Since this is difficult in practical environments we used 
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Fig. 8 Illustration of a cable blocking the path to the target. 

physical simulation to predict the current shape of the cable based on the motion 
history of the robot. Section|5]shows that this approach works well in practice. 

Notice, however, that this simple method is not sufficient when the cable com- 
pletely blocks a path to the target as shown Fig. Ob). Removing cable vertices from 
the configuration space blocks all path to the goal. We present two approaches that 
handle such cases. First, we consider a hardware solution in which the system re- 
tracts the cable. Second, we introduce a novel algorithm for feasible path planning 
that clear blocks through auxiliary robot motion. 

4.2 Hardware Solution: Cable Retraction 

First of all, the problem in Fig. [HIb) can be solved by continuously retracting the 
cable to make the cable as short as possible while allowing free robot motion. This 
approach requires additional hardware, but simplifies planning. Given cable retrac- 
tion, the robot would simply need to follow the cable to the cable base until the path 
to the target is cleared. This can be accomplished by searching for a shortest path 
on the manifold to the base and directing the robot to move along that path. 

The hardware implementation is not trivial because one must develop a special 
device than retracts the cable with appropriate force for the particular robot and 
cable type. The force must be simultaneously strong enough to pull a long cable 
and sufficiently weak to allow robot motion. Furthermore, it may be necessary to 
constantly adjust the force depending on the robot and cable status. We have not yet 
implemented this solution, however it remains an exciting topic for our future work. 



4.3 Algorithmic Solution: Untangling 

Given that the robot cannot retract the cable mechanically, it must perform auxiliary 
motions to clear the path blocked by the cable. We refer to this procedure as untan- 
gling. Consider Fig. [He)- The robot must first move to the left to clear the path to 
the target on the right. More complex domains, such as Fig. |9j have goals that are 
blocked by the cable multiple times along a single path. The algorithm is required 
to find a sequence of untangling motions. In contrast to Section [3j evaluating all 
possible motions was computationally infeasible. Instead, we developed a heuristic 
method that efficiently computes untangling motions and performs well in practice. 
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Fig. 9 Auxiliary motions open the path to the target. Each step is computed by our algorithm. 




Candiates of 
Intermediate target 




Fig. 10 A search for the most promising candidate. Candidates (left) and their accessible 
regions (right). Candidate 4 is selected in this case. 



When our system identifies that there is no open path to the target from the cur- 
rent robot location, it selects an intermediate target and moves the robot towards it. 
Motion to the intermediate target is chosen to displace the blocking cable from the 
path of the robot to the goal (Fig.|9^). If the goal becomes accessible during travel to 
the target, the online algorithm discards the intermediate target and moves directly 
to the goal. If the intermediate target becomes inaccessible or if the robot reaches 
the target, the system computes the next target. This process repeats until the goal 
becomes accessible. Fig. [9] shows a complete untangling procedure. 

For each step in the untangling process, we choose the intermediate target from 
several candidates. The most promising one is selected by internal physics-based 
simulation as in Section l4n First, we identify the region in the configuration space 
accessible from the current robot position (Fig. [TOl . We then relate each vertex in 
the configuration space graph to the minimum of the distance from the vertex to 
the region's graph center and that to the current robot position. Local maxima of 
the computed distances are chosen as candidates (Fig.[TO]left). For each candidate, 
we compute a simulated robot motion where the robot moves towards the candidate 
pulling the cable, and test whether or not the motion clears the path. We use a simple 
spring-mass model to simulate the behavior of a cable. If a candidate clears the path 
in simulation, we select the candidate as the intermediate target. If no candidate 
clears the path to the goal, we choose the candidate that is expected to maximize the 
accessible region after the robot arrives at the target (Fig. FTOl right). 
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The proposed algorithm is heuristic and is not guaranteed to find a solution if 
there is a solution. An alternative, systematic approach is to construct a search tree 
by recursively sampling candidates for the intermediate targets and search for a 
successful sequence of intermediate targets as in many path planning algorithms 
yj]. We did not implement such systematic approaches because our simple heuristic 
successfully found a path to the target via multiple intermediate targets in our ex- 
periments (SectionO when there was a solution. When there is no solution, neither 
our heuristic method nor systematic approach can find one. 

4.4 Deadlock Prevention 

The algorithm described in the previous subsection cannot find a path when the 
robot is already trapped in a deadlock configuration as shown in Fig. [TT] (left). The 
robot is trapped in the closed region and none of the auxiliary motions described 
above are able to open the way to the goal. To prevent this problem from occurring, 
we augmented the algorithm with a preprocessing step that removes configurations 
that can cause deadlocks from G. We then use the previously described runtime 
algorithms to find a deadlock free path to the target. This algorithm preforms well 
(Section [5]) but does not guarantee deadlock avoidance. It is our future work to 
develop a complete run-time algorithm for deadlock prevention. 

Starting from each graph node, the algorithm follows the path to the cable base 
by picking each adjacent vertex with minimum distance to the base. It identifies 
graph vertices that are in contact with an obstacle, yet their parents are not adjacent 
to an obstacle. These contact vertices are potential locations where a deadlock can 
occur (stars in Fig. fTTJ right). 

Having identified contact vertices, the system examines whether or not the con- 
tacts are resolvable as follows. First, we compute a region in the configuration space 
separated by the path to the cable base and accessible from the contact vertex (gray 
area in Fig.fTTIl. We then compare the maximum distance to any vertex in the region 
from the contact vertex and the remaining cable length. This is computed by sub- 
tracting the distance from the cable base to the contact node from the overall cable 
length. If the maximum distance is longer than the remaining cable length, then the 



Robot 
O 





Fig. 11 Deadlock configuration (left) and the detection of potential deadlocks (right). Left 
contact (star) is resolvable, but the right contact (star) is not resolvable. When a potential 
deadlock is detected, we remove the affected area from the configuration space (shaded area). 
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contact is resolvable by moving the robot to the most distant position. Otherwise, the 
contact causes a deadlock. In this case, our system prevents the robot from causing 
the deadlock by removing all the configuration space vertices in the accessible area 
for which the distance longer than the one to the contact points (shaded in Fig.fTTI). 



5 Experiments 

In order to validate the practical effectiveness of the proposed algorithms, we con- 
ducted a series of experiments on a physical robot. We examined the basic case 
involving overhead tracking for robot position and a robot with no hardware for ca- 
ble retraction. The cable configuration was not tracked but predicted by means of 
internal physics-based simulations for the algorithms in Section|4l Hence, our robot 
was not guaranteed to avoid cable-robot collision 100%. However, the experiments 
demonstrate that our algorithm significantly reduced the occurrence of collisions. 

We evaluated the proposed algorithm using a cabled robot on a flat floor. An 
iRobot CREATE robot was connected to a cable that provided power and control 
signal for a total of 5 bundled wires. The location of the robot was tracked by a 
vision-based motion capture system (Motion Analysis). The system consisted of 8 
infra-read high speed cameras that observe the motion of retro-reflective markers 
attached to the robot. The control PC (Dell Latitude) received the robot location 
from motion capture and sent control commands to the robot via the cable. 

Fig. [12] gives an overview of the physical environment. It is a standard office 
floor covered by carpet. The layout mimics an open office or home environment 
with obstacles such as columns and furniture. The size is 5m x 3m. Our algorithm 
represented this space with 50 x 30 grid. Fig. [13] shows the layouts used in the ex- 
periment. In each trial, the robot was placed near the cable base with a compactly 
assembled cable. It visited six given targets in a given order. The system judged that 
a target visit was complete when the distance between the robot center and the target 
center was less than the robot diameter. We ran 10 trials for every combination of 
a given algorithm and layout. We prepared a set of 10 random permutations of 6 
targets and used the same set for all combinations. 





Fig. 12 The physical environment and the cabled robot used in our experiments. 
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Fig. 13 Experimental layouts: dark gray circles are obstacles and plus marks are targets. 





Fig. 14 Configuration space boundaries for two experimental layouts. 
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Fig. 15 Sample robot experiment with untangling: a) Initial configuration and goal (red circle) 
b) Approaching the first intermediate target (red dot) c) Approaching the second intermediate 
target, d) Arriving at the second intermediate target, e-f) Approaching and arriving at the goal. 



5.7 Experimental Results 



Table 1 shows the statistics of our results. The basic algorithm completed the tasks 
with 100% success. The extended algorithm without deadlock prevention failed in 
some cases (50 — 80% success). However, adding deadlock prevention achieved 
100% success. Collisions between the robot and the cable did occur even when we 
used the extended algorithm. However the number of collisions was significantly 
reduced compared with the basic one. Fig. [14] shows the configuration space for 
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Table 1 Results from the experiments. We ran 10 trials for each combination of algorithm x 
task. 





Basic Algorithm 


Extended Algorithm 


Deadlock Prevention 


Taskl 


Task2 


Tasks 


Taskl 


Task2 


Tasks 


Taskl 


Task2 


Tasks 


Success Ratio 


100% 


100% 


100% 


80% 


50% 


80% 


100% 


100% 


100% 


Average Time (s) 


83.9 


100.4 


88.5 


100.9 


103.2 


93.0 


89.4 


110.4 


92.5 


Avg. Cable-Robot 
Collisions 


1.8 


1.7 


2 


0.25 


0.4 


0.125 


0.4 


0.3 


0.2 



the first two layouts. Fig. [15] shows an example of untangling observed during the 
experiments. It demonstrates that our algorithm successfully identified an appropri- 
ate sequence of intermediate targets. Video of the experiments and demonstration 
software are available at: http;//www.designinterface.jp/en/projects/cable. 

6 Conclusion 

Our work shows that path planning for cabled robots yields significant insight into 
homotopic path planning. We developed a configuration space formulation that dis- 
tinguishes between robot positions with distinct cable configurations. We proposed 
complete algorithms that compute the configuration space manifold and plan opti- 
mal paths given cable length constraints. Furthermore, we studied a practical exten- 
sion of our algorithm given that the robot is not permitted to cross its cable. These 
algorithms were validated on a real robot platform in a series of experiments. 

This paper opens the door to numerous variations of planning homotopic paths 
and cabled robotics. Immediate future work is the development of runtime looka- 
head detection of deadlocks. An interesting variant is path planning for robots that 
grasp or push the cable Bill . Another interesting problem is optimal placement of 
the cable base for a given environment to minimize deadlocks. Similar analysis 
would identify problematic obstacles that can cause deadlocks and warn the user. 
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An Equivalence Relation for Local Path Sets 

Ross A. Rnepper, Siddhartha S. Srinivasa, and Matthew T. Mason 



Abstract. We propose a novel enhancement to the task of collision-testing a set of 
local paths. Our approach circumvents expensive collision-tests, yet it declares a 
continuum of paths collision-free by exploiting both the structure of paths and the 
outcome of previous tests. We define a homotopy-like equivalence relation among 
local paths and provide algorithms to (1) classify paths based on equivalence, and 
(2) implicitly collision-test up to 90% of them. We then prove both correctness and 
completeness of these algorithms before providing experimental results showing a 
performance increase up to 300%. 



1 Introduction 

Planning bounded-curvature paths for mobile robots is an NP-hard problem ll22l . 
Many nonholonomic mobile robots thus rely on hierarchical planning architec- 
tures HKTSjim, which split responsibility between at least two layers (Fig. [T]): a 
slow global planner and fast local planner. We focus here on the local planner (Alg.[T] 
and Alg.O, which iterates in a tight loop; searching through a set of paths and se- 
lecting the best path for execution. During each loop, the planner tests many paths 
before making an informed decision. The bottleneck in path testing is collision- 
testing 11241 . In this paper, we introduce a novel approach that delivers a significant 
increase in path set collision-testing performance by exploiting the fundamental ge- 
ometric structure of paths. 

We introduce an equivalence relation intuitively resembling the topological no- 
tion of homotopy. Two paths are path homotopic if a continuous, collision-free de- 
formation with fixed start and end points exists between them Il20l . Like any path 
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Fig. 1 An example hierar- 
chical planning scenario. 
The local planner's path set 
expands from the robot, at 
center, and feeds commands 
to the robot based on the 
best path that avoids obsta- 
cles (black squares). The 
chosen local path (green) 
and global path (red) com- 
bine to form a proposed path 
to the goal. 

equivalence relation, homotopy partitions paths into equivalence classes. Different 
homotopy classes make fundamentally different choices about their route amongst 
obstacles. However, two mobile robot concepts translate poorly into homotopy the- 
ory: limited sensing and constrained action. 

The robot may lack a complete workspace map, which must instead be con- 
structed from sensor data. Since robot perception is limited by range and occlusion, 
a robot's understanding of obstacles blocking its movement evolves with its vantage 
point. A variety of sensor-based planning algorithms have been developed to handle 
such partial information. Obstacle avoidance methods, such as potential fields lfT2l . 
are purely reactive. The bug algorithm fTSl, which generates a path to the goal us- 
ing only a contact sensor, is complete in 2D. Choset and Burdick [Sj present the 
hierarchical generalized Voronoi graph, a roadmap with global line-of-sight acces- 
sibility that achieves completeness in higher dimensions using range readings of the 
environment. 

If a robot is tasked to perform long-range navigation, then it must plan a path 
through unsensed regions. A low-fidelity global planner generates this path because 
we prefer to avoid significant investment in this plan, which will likely be invalidated 
later. Path homotopy, in the strictest sense, requires global knowledge of obstacles 
because homotopy equivalent paths must connect fixed start and goal points. 

Relaxing the endpoint requirement avoids reasoning about the existence of far- 
away, unsensed obstacles. Naively relaxing a fixed endpoint, our paths might be 
permitted to freely deform around obstacles, making all paths equivalent. To re- 
store meaningful equivalence classes, we propose an alternate constraint based on 
path shape. This is in keeping with the nonholonomic constraints that limit mo- 
bile robots' action. Laumond lISll first highlighted the importance of nonholonomic 
constraints and showed that feasible paths exist for a mobile robot with such con- 
straints. Barraquand and Latombe |l2| created a grid-based planner that innately 
captures these constraints. LaValle and Kuffner (TT\ proposed the first planner to 
incorporate both kinodynamic constraints and random sampling. In contrast to non- 
holonomic constraints, true homotopy forbids restrictions on path shape; two paths 
are equivalent if any path deformation — however baroque — exists between them. 
By restricting our paths to bounded curvature, we represent only feasible motions 
while limiting paths' ability to deform around obstacles. The resulting set of path 



An Equivalence Relation for Local Path Sets 



21 





Fig. 2 left: Paths from a few distinct homotopy classes between the robot and the goal. The 
distinctions between some classes require information that the robot has not yet sensed (the 
dark area is out of range or occluded), middle: With paths restricted to the sensed area, they 
may freely deform around visible obstacles, right: After restricting path shape to conform to 
motion constraints, we get a handful of equivalence classes that are immediately applicable 
to the robot. 

equivalence classes is of immediate importance to the planner (Fig.O. The number 
of choices represented by these local equivalence classes relates to Farber's topo- 
logical complexity of motion planning (U\. 

Equivalence classes have been employed in various planners. In task planning, re- 
cent work has shown that equivalence classes of actions can be used to eliminate re- 
dundant search f71|. In motion planning, path equivalence often employs homotopy. 
A recent paper by Bhattacharya, Kumar, and Likhachev IS provides a technique 
based on complex analysis for detecting homotopic equivalence among paths in 2D. 
Two papers employing equivalence classes to build probabilistic roadmaps fTTI are 
by Schmitzberger, et al. 125)1 and Jaillet and Simeon ifTOll . The latter paper departs 
from true homotopy by proposing the visibility deformation, a simplified alternative 
to homotopic equivalence based on line-of-sight visibility between paths. 

Our key insight is that local path equivalence is an expressive and powerful tool 
that reveals shared outcomes in collision-testing. Specifically, two equivalent neigh- 
boring paths cover some common ground in the workspace, and between them lies 
a continuum of covered paths. We develop the mathematical foundations to detect 
equivalence relations among all local paths based on a finite precomputed path set. 
We then utilize these tools to devise efficient algorithms for detecting equivalence 
and implicitly collision-testing local paths. 

The remainder of the paper is organized as follows. We provide an implementa- 
tion of the basic algorithm in Section |2] and present the fast collision-testing tech- 
nique. Section [3] then explores the theoretical foundations of our path equivalence 
relation. Section|4]provides some experimental results. 



2 Algorithms 

In this section, we present three algorithms: path set generation, path classification, 
and implicit path collision- testing. All of the algorithms presented here run in poly- 
nomial time. Throughout this paper, we use lowercase p to refer to a path in the 
workspace, while !P is a set of paths (each one a point in path space). 
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Algorithm 1. Test_All_Paths(H', ?) 

Input: w - a costmap object; CP - a fixed set of paths 

Output: 7 free^ the set of paths that passed colhsion test 

1: y free ^9 

2: while time not expired and untested paths remain do // test paths for 0. 1 seconds 

3: p^ GetMe.xtJ'ath{'J') 

4: collision*— w.TestJ'ath{p) // collision is boolean 

5: if not coZ/«ion then 

6: '3' free ^ "^free U {p} II non-colliding path set 

7: return 7 free 



Algorithm 2. Local_Planner_Algorithm(H', x, h, T) 

Input: w - a costmap object; x - initial state; li-a heuristic function for selecting a path to 
execute; 

y - a fixed set of paths 
Output: Moves the robot to the goal if possible 

1: 

2: 

3: 

4: 

5: 



while not at goal and time not expired do 

'3' free ^ Test MlJPaths{w,x,'J') 
j ^ h.BestJ'ath{x, 7 free) 
Execute J'athJDnJiobot{j) 
X <— Predict Mext ^tate{x, j) 



Dehnltion 1. Path space is a metric space (CP,/i) in which the distance between a 
pair of paths in !P is defined by metric jl. Paths can vary in shape and length. D 

2.1 Path Set Generation 

We use the greedy path set construction technique of Green and Kelly [8J, outlined 
in Alg. [3] The algorithm iteratively builds a path set Tn by drawing paths from a 
densely-sampled source path set, X. During step /, it selects the path p € X that 
minimizes the dispersion of T, = CP,_i U {p}. Borrowing from Niederreiter 121}: 

Definition 2. Given a bounded metric space (X, ^) and point set 7 = {xi ,... ,xn} S 
X, the dispersion of T in X is defined by 

5(J',X) = supmin/j(x,/7) (1) 

O 

The dispersion of T in X equals the radius of the biggest open ball in X containing 
no points in 7. By minimizing dispersion, we ensure that there are no large voids 
in path space. Thus, dispersion reveals the quality of 7 as an "approximation" of 
X because it guarantees that for any x e X, there is some point p ^ 7 such that 
^iix,p)<5i7,X). 

The Green-Kelly algorithm generates a sequence of path sets CP,, for / £ { 1 , . . . , A^}, 
that has monotonically decreasing dispersion. Alg. [T] searches paths in this order at 
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Algorithm 3. Green JCelly(X, N) 

Input: DC - a densely-sampled, low-dispersion path set; A' < |DC| - the target path set size 
Output: path sequence CP^ of size A' 



^0^0 

while n < N do 

n <— «-|- 1 

p <— argmin 5(5'„_i U {x}, X) 

return CP^f 



runtime, thus permitting early termination while retaining near-optimal results. Note 
that while the source set X is of finite size — providing a lower bound on dispersion 
at runtime — it can be chosen with arbitrarily low dispersion a priori. 

2.2 Path Classification 

We next present Alg. H] which classifies collision-free members of a path set. The 
Hausdorff metric is central to the algorithm. Intuitively, this metric returns the great- 
est amount of separation between two paths in the workspace. From Munkres ll20ll : 

^^H{pi,Pj) = inf{e : Pi C {pj)e and pj C {pi)e}, (2) 

where {p)r denotes dilation of p by r: {t &M? : \\tp — f ||l2 < '" for some tp e p}. 
Note that jj.h satisfies all properties of a metric |[9j. For our fixed path set generated 
by Green- Kelly, we precomputed each pairwise path metric value of ([2| and stored 
them in a lookup table for rapid online access. 

Alg. 2] performs path classification on a set of paths that have already tested 
collision-free at runtime. We form a graph G = {V,E) in which node v,- £ V cor- 
responds to path Pi. Edge e,y £ E joins nodes v,- and Vj when this relation holds: 

IJ-H{pi,Pj)<d, (3) 

where d is the diameter of the robot. Taking the transitive closure of this relation, 
two paths Pa and pi, are equivalent if nodes Va and v^ are in the same connected 
component of G (Fig.[3]l. 

In effect, this algorithm constructs a probabilistic roadmap (PRM) in the path 
space instead of the conventional configuration space. A query into this PRM tells 
whether two paths are equivalent. As with any PRM, a query is performed by adding 
two new graph nodes Vj and Vg corresponding to the two paths. We attempt to join 
these nodes to other nodes in the graph based on ([3]). The existence of a path con- 
necting Vs to Vg indicates path equivalence. 
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Fig. 3 A simple path set, 
in which obstacles (black) 
eliminate colliding paths. 
The collision-free path 
set has three equivalence 
classes (red, green, and 
blue). In the correspond- 
ing graph representation, 
at right, adjacent nodes 
represent proximal paths. 
Connected components in- 
dicate equivalence classes 
of paths. 




Algorithm 4. Equivalence_Classes(J'/ree, d) 



Input: y/iee - a set of safe, appropriate paths; d - the diameter of the robot 
Output: D -a partition of Vfree into equivalence classes (a set of path sets) 



LetG=(K,£)^(0,0) 
O^0 

for all Pi e 'J' free do 
V.add{pi) 
for all pj eV\ Pi do 

'iijiH{pi,Pj) <rfthen 
E.add(i,i) 

h < y free 

while S 7^ do 

e^0 

p <— a member of S 
L^{p} 
while iL 7^ do 

p ^ a member of Cj 

e^eu{p} 
s ^ s - {p} 

return D 



II This loop discovers adjacency 
// Add a graph node corresponding to path p, 



// Connect nodes ; and j with an unweighted edge 
// This loop finds the connected components 

// List of nodes to be expanded in this class 



// Commit p to class 



'ors{p)) ns 



2.3 Implicit Path Safety Test 

There is an incessant need in motion planning to accelerate collision-testing, which 
may take 99% of total CPU time ll24ll . During collision-testing, the planner must 
verify that a given swath is free of obstacles. 

Definition 3. A swath is the workspace area of ground or volume of space swept 
out as the robot traverses a path. D 



Definition 4. We say a path is safe if its swath contains no obstacles. 



D 
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Algorithm 5. Test_PathJmplicit(/7, w, §, d) 

Input: p is a path to be tested 

Input: w is a costmap object // used as a backup when path cannot be implicitly tested 

Input: S is the set of safe paths found so far 

Input: d is the diameter of the robot 

1: 

2: 

3: 

4: 

5: 

6: 



for all PhPj £ S such that jj.H{pi,Pj) <d do 

it p.IsJietween{pj,pj) then // p's swath has been tested previously 

Sf <— p.Get JindJ'ointO 

collision <— w.TestJ'oint{sf) II endpoint may not be covered by swaths 

return collision 
return w.Test J'athl^p) // Fall back to explicit path test 



In testing many swaths of a robot passing through space, most planners effectively 
test the free workspace many times by testing overlapping swaths. We may test a 
path implicitly at significant computational savings by recalling recent collision- 
testing outcomes. We formalize the idea in Alg.[5] which is designed to be invoked 
from Alg.[Tl line|4]in lieu of the standard path test routine. 

The implicit collision-test condition resembles the neighbor condition ^ used 
by Alg. m but it has an additional "Is_Between" check, which indicates that the 
swath of the path under test is covered by two collision-free neighboring swaths. 
The betweenness trait can be precomputed and stored in a lookup table. Given a set 
of safe paths, we can quickly discover whether any pair covers the path under test. 
Experimental results show that this algorithm allows us to test up to 90% of paths 
implicitly, thus increasing the path evaluation rate by up to 300% in experiments. 



3 Foundations 

In this section, we establish the foundations of an equivalence relation on path 
space based on continuous deformations between paths. We then provide correct- 
ness proofs for our algorithms for classification and implicit collision-testing. 

We assume a kinematic description of paths. All paths are parametrized by 
a shared initial pose, shared fixed length, and individual curvature function. Let 
Ki{s) describe the curvature control of path ; as a function of arc length, with 
maxo<5<5, 1 Ki{s) I < K,nax- Typical expressions for k",- include polynomials, piecewise 
constant functions, and piecewise linear functions. The robot motion produced by 
control i is a feasible path given by 



[0/1 




Ki 


Xi 


= 


COS 0/ 


yi 




sin 0, 



(4) 



Definition 5. A feasible path has bounded curvature (implying C' continuity) and 
fixed length. The set 3^{sf, K,„ax) contains all feasible paths of length Sf and curva- 
ture |k:(5)| < )C,„„. □ 
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Fig. 4 At top: several 
example paths combin- 
ing different values of v 
and w. Each path pair 
obeys l|3j. The value of 
V affects the "curviness" 
allowed in paths, while 
w affects their length. 
At bottom: this plot, gen- 
erated numerically, approxi- 
mates the set of appropriate 
choices for v and w. The 
gray region at top right must 
be avoided, as we show in 
Lemma |2] Such choices 
would permit an obstacle 
to occur between two safe 
paths that obey ^. A path 
whose values fall in the 
white region is called an 
appropriate path. 
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3.1 Properties of Paths 

In this section, we establish a small set of conditions under which we can quickly de- 
termine that two paths are equivalent. We constrain path shape through two dimen- 
sionless ratios relating three physical parameters. We may then detect equivalence 
through a simple test on pairs of paths using the Hausdorff metric. 

These constraints ensure a continuous deformation between neighboring paths 
while permitting a range of useful actions. Many important classes of action sets 
obey these general constraints, including the line segments common in RRT f\n\ and 
PRM planners, as well as constant curvature arcs. Fig.[T]illustrates a more expressive 
action set ifTSJI that adheres to our constraints. 

The three physical parameters are: d, the diameter of the robot; Sf, the length of 
each path; and r„,„,, the minimum radius of curvature allowed for any path. Note 
that l/rmin = Kmax, the upper bound on curvature. For non-circular robots, d reflects 
the minimal cross-section of the robot's swath sweeping along a path. We express 
relationships among the three physical quantities by two dimensionless parameters: 



d 



^f 



2nr,, 



We only compare paths with like values of v and vv. Fig.|4|top) provides some intu- 
ition on the effect of these parameters on path shape. Due to the geometry of paths, 
only certain choices of v and w are appropriate. 

Definition 6. An appropriate path is a feasible path conforming to appropriate val- 
ues of V and vv from the proof of Lemma|2l Fig.|4]previews the permissible values. 
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When the condition in (O is met, the two paths' swaths overlap, resulting in a con- 
tinuum of coverage between the paths. This coverage, in turn, ensures the existence 
of a continuous deformation, as we show in Theorem[Tl but first we formally define 
a continuous deformation between paths. 

Definition 7. A continuous deformation between two safe, feasible paths /?, and pj 

in 9^(5/, K„,ax) is a continuous function/: [0, 1] -^ 3^(^7, T^max)^ ^i'^h 57 slightly less 
than Sf and k:+^,^- slightly more than Kmax- /(O) is the initial interval of /?,■, and /(I) 
is the initial interval of pj, both of length ^7. We write pi ^ pj to indicate that 
a continuous deformation exists between paths p,- and pj, and they are therefore 
equivalent. D 

The length si depends on v and vv, but for typical values, 57 is fully 95-98% of 
Sf. For many applications, this is sufficient, but an application can quickly test the 
remaining path length if necessary. Nearly all paths /(c) are bounded by curvature 
Kmax, but it will tum out that in certain geometric circumstances, the maximum 
curvature through a continuous deformation is up to k",+,-^- = | Knax- 

Definition 8. Two safe, feasible paths that define a continuous deformation are 
called guard paths because they protect the intermediate paths. D 

In the presence of obstacles, it is not trivial to determine whether a continuous de- 
formation is safe, thus maintaining equivalency. Rather than trying to find a defor- 
mation between arbitrary paths, we propose a particular condition under which we 
show that a bounded-curvature, fixed-length, continuous path deformation exists, 

IJ.h{pi,P2) <d =^ pi ^ p2- (5) 

This statement, which we prove in the next section, is the basis for Alg.|4]and Alg.|5] 
The overlapping swaths of appropriate paths pi and p2 cover a continuum of inter- 
mediate swaths between the two paths. Eqn. (|5ll is a proper equivalence relation 
because it possesses each of three properties: 

• refiexivity. jj.h{p,p) = 0; /? is trivially deformable to itself. 

• symmetry. The Hausdorff metric is symmetric. 

• transitivity. Given }Xh{p\tP2) <d and jj.h{p2,P3) <d, a continuous deforma- 
tion from pi to pj, passes through p2. 

3.2 Equivalence Relation 

Having presented the set of conditions under which (|5]l holds, we now prove that 
they are sufficient to ensure the existence of a continuous deformation between two 
neighboring paths. Our approach to the proof will be to first describe a feasible 
continuous deformation, then show that paths along this deformation are safe. 

Given appropriate guard paths pi and pj with common origin, let pe be the short- 
est curve in the workspace connecting their endpoints without crossing either path 
{pe may pass through obstacles). The closed path B = piU pj U pe creates one or 
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Fig. 5 Paths Pi,Pj, and pg 
form boundary B. Its inte- 
rior, /, contains all paths in 
the continuous deformation 
from Pi to pj. 

more closed loops (the paths may cross each other). By the Jordan curve theorem, 
each loop partitions M? into two sets, only one of which is compact. Let /, the inte- 
rior, be the union of these compact regions with B, as in Fig.jS] 

Definition 9. A path pc is between paths /?,■ and pj if p^ C /. D 

Lemma 1. Given appropriate paths pt.pj C 3^(5/-, K,r,ax) with iJ.H{pi,Pj) < d, apath 
sequence exists in the form of a feasible continuous deformation between pi and pj. 

Proof. We provide the form of a continuous deformation from p,- to pj such that 
each intermediate path is between them. With t a workspace point and p a path, let 



r{t,p)^mf{e:te{p)e} (6) 

,, [[0,1] if Y{t, Pi) = rit,pj)=0 

g{n = < ( y(t,pi) \ 

y I r(f.;'/)+r(f.P;) J 



(7) 
otherwise. 



where g{t) is a set- valued function to accommodate intersecting paths. Each level 
set g{t) = c for c £ [0, 1] defines a weighted generalized Voronoi diagram (GVD) 
forming a path as in Fig. |6l We give the form of a continuous deformation using 
level sets g^'(c); each path is parametrized starting at the origin and extending for 
a length s7 in the direction of pg. Let us now pin down the value of ^7 . Every point 
ti on Pi forms a line segment projecting it to its nearest neighbor tj on pj (and vice 
versa). Their collective area is shown in Fig. [T) Eqn. (O bounds each segment's 
length at d. .jT is the greatest value such that no intermediate path of length 57 
departs from the region covered by these projections. 

For general shapes in M?, the GVD forms a set of curves meeting at branching 
points 123 J . In this case, no GVD cusps or branching points occur in any interme- 
diate path. Since d < r„,i„, no center of curvature along either guard path can fall in 
/ 101 . Therefore, each level set defines a path through the origin. 

Each path's curvature function is piecewise continuous and everywhere bounded. 
A small neighborhood of either guard path approximates constant curvature. A GVD 
curve generated by two constant-curvature sets forms a conic section fFf\. Table [T] 
reflects that the curvature of pc is everywhere bounded with the maximum possible 
curvature being bounded by | K,„ax- For the full proofs, see llT4l . D 

Lemma 2. Given safe, appropriate guard paths Pi,Pj G S'isf, Kmax) separated by 
l^niPiiPi) < d, any path pc C 3^(.s7, -^Knux) between them is safe. 

Proof. We prove this lemma by contradiction. Assume an obstacle lies between pi 
and Pj. We show that this assumption imposes lower bounds on v and w. We then 
conclude that for lesser values of v and w, no such obstacle can exist. 
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Fig. 6 In a continuous de- 
formation between patlis p,- 
and pj, as defined by the 
level sets of Q, each path 
takes the form of a weighted 
GVD. Upper bounds on cur- 
vature vary along the defor- 
mation, with the maximum 
bound of ^Kmax occurring 
at the medial axis of the two 
paths. 





Fig. 7 Hausdorff coverage (overlapping red and blue shapes in center) is a conservative ap- 
proximation of swath coverage (gray). The Hausdorff distance between paths p,- and pj is 
equal to the maximum-length projection from any point on either path to the closest point on 
the opposite path. Each projection implies a line segment. The set of projections from the top 
line (blue) and bottom line (red) each cover a solid region between the paths. These areas, 
in turn, cover a slightly shorter intermediate path pc, in white, with its swath in cyan. This 
path's length, s^ is as great as possible while remaining safe, with its swath inside the gray 



Table 1 Conic sections form the weighted Voronoi diagram. k"i and K"2 represent the cur- 
vatures of the two guard paths, with k"i the lesser magnitude. Let K„, = maxdiCil, |k"2|). For 
details, see 1141 . 



Type 


Occurrence 


Curvature bounds of intermediate paths 


line 


Ki=-K2 


\k\ <k,„ 


parabola 


Ki=0,K2^0 


\k\ <K,n 


hyperbola 


K1K2 <0,Ki^ -K2 


\k\ <K„ 


elUpse 


JCl K-2 > 


\k\ < ^K„, 



Let sl{p,d) — {t & M^jfp = nn{t,p) : t^ ± p and \\t — tp\\i2 < f } define a con- 
servative approximation of a swath, obtained by sweeping a line segment of length 
d with its center along the path. Tj^ is the line segment joining tp to t and nn{t,p) 
is the nearest neighbor of point / on path p. The two swaths form a safe region, 
U = sl{pi,d) Usl{pj,d). 

Suppose that U contains a hole, denoted by the set h, which might contain an 
obstacle. Now, consider the shape of the paths that could produce such a hole. 
Beginning with equal position and heading, they must diverge widely enough to 
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(a) 



Fig. 8 (a) With bounded curvature, there is a lower bound on path lengths that permit a 
hole, h, while satisfying ([3}- Shorter path lengths ensure the existence of a safe continuous 
deformation between paths, (b) We compute the maximal path length that prevents a hole 
using Vendittelli's solution to the shortest path for a Dubins car. Starting from the dot marked 
s, we find the shortest path intersecting the circle D. The interval p? illustrates path lengths 
permitting a hole to exist. 

separate by more than d. To close the loop in U, the paths must then bend back 
towards each other. Since the paths separate by more than d, there exist two open 
intervals p[' C pi and p''- C pj surrounding the hole on each path such that (at this 
point) p'j (f. {pj)d and p'j (f. {j>i)d- To satisfy ^, there must exist later intervals 
p\ C Pi such that p^: C {p'i)d and likewise /jy C pj such that pJ' C {p''j)d, as in 
Fig. Hi. 

How long must a path be to satisfy this condition? Consider the minimum length 
solution to this problem under bounded curvature. For each point t £ p''-, the interval 
pI must intersect the open disc D = int((f)^/), as in Fig.[8j3. Since p''- grows with the 
width of h, and pf must intersect all of these open neighborhoods, the path becomes 
longer with larger holes. We will therefore consider the minimal small-hole case. 

Vendittelli 1261 solves the shortest path problem for a Dubins car to reach a line 
segment. We may approximate the circular boundary of D by a set of arbitrarily 
small line segments. One may show from this work that given the position and slope 
of points along any such circle, the shortest path to reach its boundary (and thus its 
interior) is a constant-curvature arc of radius r„„„ . In the limit, as v approaches one 
and the size of h approaches zero, the length of arc needed to satisfy lO approaches 
n/2 from above, resulting in the condition that w > 0.48. Thus, for w < 0.48 and 
V G [0, 1), pc is safe. For smaller values of v, D shrinks relative to r„„„, requiring 
longer paths to reach, thus allowing larger values of w as shown in the plot in Fig.lD 

We have shown that there exist appropriate choices for v and w such that (O 
implies that U contains no holes. Since U contains the origin, any path pc G / ema- 
nating from the origin passes through U and is safe. D 

Theorem 1. Given safe, appropriate guard paths Pi,Pj G ^{sf, f^max), cmd given 
jJ.H{pi,Pj) <d, a safe continuous deformation exists between pi and pj. 
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Proof. Lemma [T] shows that Q gives a continuous deformation between paths pi 
and pj such that each intermediate path pc C / is feasible. Lemma[2] shows that any 
such path is safe. Therefore, a continuous deformation exists between /?, and pj. 
This proves the validity of the Hausdorff metric as a test for path equivalence. D 

3.3 Resolution Completeness of Path Classifier 

In this section, we show that Alg.|4]is resolution complete. Resolution complete- 
ness commonly shows that for a sufficiently high discretization of each dimension 
of the search space, the planner finds a path exactly when one exists in the contin- 
uum space. We instead show that for a sufficiently low dispersion in the infinite- 
dimensional path space, the approximation given by Alg.|4]has the same connectiv- 
ity as the continuum safe, feasible path space. 

Let 'J be the continuum feasible path space and 'J free C ? be the set of safe, 
feasible paths. Using the Green-Kelly algorithm, we sample offline from 3^ a path 
sequence 7 of size A^. At runtime, using Alg. [T] we test members of !P in order to 
discover a set "ffree C T of safe paths. 

The following lemma is based on the work of LaValle, Branicky, and Linde- 
mann |fT6l . who prove resolution completeness of deterministic roadmap (DRM) 
planners, which are PRM planners that draw samples from a low-dispersion, deter- 
ministic source. Since we use a deterministic sequence provided by Green-Kelly, 
the combination of Alg. [T| and |4] generates a DRM in path space. 

Lemma 3. For any given configuration of obstacles and any path set 'J'n generated 
by the Green-Kelly algorithm, there exists a sufficiently large N such that any two 
paths Pi,Pj £ J'/ree are in the same connected component of J free if and only if 
Alg.\4\reports that pi ~ pj. 

Proof. LaValle, et al. Ill6i . show that by increasing A^, a sufficiently low dispersion 
can be achieved to make a DRM complete in any given C-Space. By an identical 
argument, given a continuum connected component C C 3^ free, all sampled paths 
in 6 n Vn are in a single partition of D. If q is the radius of the narrowest corridor 
in C, then for dispersion d^ < q, our discrete approximation exactly replicates the 
connectivity of the continuum freespace. D 

Lemma 4. Under the same conditions as in Lemma |5] there exists a sufficiently 
large N such that for any continuum connected component C C 'J free, Alg. \l\retums 
a Tfree such that 'J' free H C ^ 0. That is, every component in J free ho.s a correspond- 
ing partition returned by Alg. |5] 

Proof. Let 5, be the largest open ball of radius r in 6. When 5n < r, Br must contain 
some sample /? e !P. Since C is entirely collision-free, p G Tfree- Thus, for dispersion 
less than r, Vfree contains a path in C. D 

There exists a sufficiently large A^ such that after A^ samples, CP has achieved dis- 
persion 5iv < min(^,r), where q and r are the dispersion required by Lemmas [3] 
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andlH respectively. Under such conditions, a bijection exists between the connected 

components of 7 free and J free- 

Theorem 2. Let D = {CDij . . . |2)m} be a partition of T free as defined by Alg.^ Let 
C = {Ci I . . . |Cm} be a finite partition of the continuum safe, feasible path space into 
connected components. A bijection f : D ^ C exists such that Di C /('D;)- 

Proof. Lemma[3]establishes that / is one-to-one, while Lemma|4]establishes that / 
is onto. Therefore, / is bijective. This shows that by sampling at sufficiently high 
density, we can achieve an arbitrarily good approximation of the connectedness of 
the continuum set of collision-free paths in any environment. D 

Theorem 3. A path interval p may be implicitly tested safe if it is between paths 
Pi and pj such that lJ.H{pi,Pi) < d and a small region at the end of pc has been 
explicitly tested. 

Proof. By Lemma |2l the initial interval of pc is safe because its swath is covered 
by the swaths of the guard paths. Since the small interval at the end of pc has been 
explicitly tested, the whole of pc is collision-free. D 

4 Results 

We briefly summarize some experimental results involving equivalence class detec- 
tion and implicit path collision-testing. All tests were performed in simulation on 
planning problems of the type described in ifTSl . 

Path classification imposes a computational overhead due to the cost of searching 
collision-free paths. Collision rate in turn relates to the density of obstacles in the 
environment. The computational overhead of our classification implementation is 




Fig. 9 Paths tested per time-limited replan 
step in an obstacle-free environment. Path 
testing performance improves by up to 3x 
with the algorithms we present here. Note 
that an artificial ceiling curtails performance 
at the high end due to a maximum path set of 
size 2,401. 



Fig. 10 Paths tested per 0.1 second time 
step at varying obstacle densities. Implicit 
collision-testing allows significantly more 
paths to be tested per unit time. Even in ex- 
tremely dense clutter, implicit path testing 
considers an extra six paths on average. 
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nearly 20% in an empty environment but drops to 0.3% in dense clutter. However, 
implicit collision-testing more than compensates for this overhead. 

Fig.|9]shows the effect of implicit path testing on total paths tested in the absence 
of obstacles. As the time limit increases, the number of paths collision-tested un- 
der the traditional algorithm increases linearly at a rate of 8,300 paths per second. 
With implicit testing, the initial test rate over small time limits (thus small path set 
sizes) is over 22,500 paths per second. The marginal rate declines over time due 
to the aforementioned overhead, but implicit path testing still maintains its speed 
advantage until the entire 2,401 -member path set is collision-tested. 

Fig.[10]presents implicit collision-testing performance in the presence of clutter. 
We compare the implicit collision-tester in Alg.|5]to traditional explicit collision- 
testing. When fixing the replan rate at 10 Hz, implicit path evaluation maintains an 
advantage, despite the overhead, across all navigable obstacle densities. 

5 Discussion and Future Work 

In this paper, we propose an equivalence relation on local paths based on the fol- 
lowing constraints: fixed start position and heading, fixed length, and bounded cur- 
vature. We describe an algorithm for easily classifying paths using the Hausdorff 
distance between them. Path classification is a tool that permits collective reasoning 
about paths, leading to more efficient collision-testing. 

There are many other applications for path equivalence. One example uses path 
class knowledge in obstacle avoidance to improve visibility and safety around ob- 
stacles. Another avenue of future work involves generalizing path equivalence to 
higher dimensions. For instance, an implicit path test for a robot floating in 3D re- 
quires three neighboring paths, while a manipulator arm needs only two. 
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Using Lie Group Symmetries for Fast Corrective 
Motion Planning 

Konstantin Seller, Surya P.N. Singh, and Hugh Durrant-Whyte 



Abstract. For a mechanical system It often arises that its planned motion will need 
to be corrected either to refine an approximate plan or to deal with disturbances. This 
paper develops an algorithmic framework allowing for fast and elegant path correc- 
tion for nonholonomic underactuated systems with Lie group symmetries, which 
operates without the explicit need for control strategies. These systems occur fre- 
quently in robotics, particularly in locomotion, be it ground, underwater, airborne, 
or surgical domains. Instead of reintegrating an entire trajectory, the method alters 
small segments of an initial trajectory in a consistent way so as to transform it via 
symmetry operations. This approach is demonstrated for the cases of a kinematic 
car and for flexible bevel tip needle steering, showing a prudent and simple, yet 
computationally tractable, trajectory correction. 

1 Introduction 

In practice, mechanical systems drift. Be it due to unexpected disturbances or in 
order to refine a coarse plan, corrective motion planning seeks to efficiently adapt a 
given trajectory in an elegant way. This is of particular interest in the agile control 
of underactuated nonholonomic systems. The nature of these systems is that certain 
degrees of freedom can only be controlled in a coupled manner (if at all). This 
makes it computationally hard to determine simple and valid trajectories O UTTl . 
thus it is preferable to efficiently adapt a given trajectory in an elegant way without 
having to start anew. Even in cases where explicit control laws are available, pure 
pursuit tracking is likely to produce unwanted artefacts due to its myopic nature 
151131 . Taking a larger horizon into account increases algorithmic and computational 
complexity, but enables alterations to the path in an elegant way. An example of such 
corrections is illustrated in Fig.[Tl 
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Fig. 1 A car (small rectangle) Is following a previously planned path (solid line) to the goal 
(dot), but got off track due to disturbances. The left Image shows a pure pursuit controller 
trying to get back on track as quickly as possible, resulting in unnecessary turns (dashed 
line). A more natural solution is shown in the second picture where the available space is 
used to elegantly correct the path during the upcoming turn. 



Mechanical systems frequently exhibit symmetries that can be represented as Lie 
groups of translation or rotation ll8l [T4l[T7l . Exploiting this can allow for elegant 
trajectory corrections in a computationally tractable way. This is valuable as the de- 
grees of freedom represented within this symmetry group are often the ones that are 
only indirectly modifiable, and thus hard to control. For example, for most vehicles 
(be it submersible, ground, or airborne) properties such as thrust, speed and turning 
rate can be easily influenced; whereas, the position and heading are hard to control. 
As the latter often exhibits aforementioned symmetries, these methods allow for 
efficient planning and control of this subset. 

Towards this, an algorithmic framework is introduced that allows for elegant 
planning and control systems that exhibit symmetries but are hard to control due 
to nonholonomic constraints. The method works without prior knowledge of con- 
trol strategies specific to the system at hand. Further, it can be used either as an aid 
within an existing planning technique such as rapidly exploring random tree (RRT) 
ox probabilistic roadmap (PRM) algorithms ITTI : or, as presented here, on its own 
in order to adapt an existing trajectory and partly replace a classical controller. 

This approach generalises on the use of Lie group actions for gap reduction dur- 
ing RRT planning. Cheng |4J, for example, introduced a method to insert coasting 
trajectories into an existing trajectory in order to reduce gaps that arise during sam- 
pling based planning. That approach is likely to perform well for twisted paths but 
it comes short for less twisted ones as there is no possibility to shorten any part 
of an initial trajectory to recover from overshooting. The algorithm presented over- 
comes this problem by actually altering existing segments of the initial trajectory in 
a consistent manner. 

The following discussion is framed on the assumption that an initial path has 
been obtained, but needs to be corrected as it does not reach the desired goal. 
Such corrections might be necessary due to gaps arising from sampling based 
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planners, dynamic changes in the environment, or due to disturbances. Furthermore 
the algorithm is designed under the assumption that the initial path is up to some 
degree surrounded by free space, but may still contain narrow doorway situations. 
For ease of presentation, this work concentrates on altering degrees of freedom rep- 
resented by aforementioned symmetry groups. It is understood that the remaining 
degrees are dealt with via a classical planning or control methods 1 1 1 1. 

The remainder of this paper is structured as follows. Section|2]develops the math- 
ematical model and introduces the basic concepts for trajectory alteration. Section[3] 
presents the algorithmic framework for situations without obstacles and shows re- 
sults of its application to the (kinematic) car tracking and flexible needle steer- 
ing problems. This is extended to the cases with obstacles in Section ID Finally, 
Section[5] summarises the ideas presented and discusses future applications. 

2 Mathematical Model 
2.1 Basic Definitions 

Kinodynamics can be defined on a state space X, which itself is a differentiable man- 
ifold with a metric ifTTl . The subset Xobs C X denotes the states that have obstacles, 
and its complement Xfy^e '■— ^ \^obs is the viable free space. For clarity of presen- 
tation, an obstacle free setting (Xobs = 0) is assumed (cases with obstacles will be 
tackled in SectionHJl. The space U C M" represents the the system's control inputs. 
System progress is modelled via a set of ordinary differential equations (ODE) 

x — F{x,u) (1) 

forx G X and u <eU. 

It is assumed that the reader is familiar with the notion of Lie groups. An intro- 
duction to the topic can be found, for example, in 112111 . Let G be a Lie group acting 
on X such that F{.,u) are left-invariant vector fields under the action of G. That is, 
there exists a multiplication law for elements g £ G and x £ X, such that gx &X, 
and for every trajectory x{t) : / C M ^ X and control input u{t) : I -^ U fulfilling 
Eq. ([T]), the product gx{t) also fulfils (|T]| for the same u{t). This setting often allows 
for a decomposition of the state space X in the form 

X^ZxG 

where the manifold Z is the base space and the Lie group G is denoted the fibre 
component]^ The projections from X onto its components Z and G are denoted nz 
and Kg respectively. Common examples of such invariantly acting Lie groups arising 
from the system's symmetry group, are translations (R"), rotations (S0(2), S0(3)) 
or combinations thereof (SE(2), SE(3), M^ x S0(2), . . . ). 



' For the decomposition to exist, the Lie group's action has to he, free. That is, for all x e X 
and g,h IE G it has to be true that gx = hx implies g = h. If G is a symmetry group of the 
system, this is usually the case. 
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Applying this framework to the example of a kinematic car yields a state space 
X containing five dimensions, denoted speed v, turning rate ft), position x and y, and 
heading 6. The control inputs U contain two dimensions, acceleration a and change 
in turning rate p . The equations of motion are 



V 



a 



(0 = p 

X = cos(0)v 
y = sin(0)v 
9 = (Ov . 

Since the car's behaviour is independent of position and heading in the sense that, if 
a valid path is translated or rotated, the resulting path is equally valid, as illustrated 
in Fig. [21 these dimensions form a symmetry Lie group to the system. Thus G should 
be set to be the group of Euler transformations, SE(2), representing x, y and Q. The 
remaining base space Z is spanned by v and ft). Thus 

X = R- X SE(2) . 

Introducing some notation simplifies matters. Let / C M be a closed finite interval. 
Then I and /+ denote the lower and upper boundary values respectively, such that 

A time dependent control input is considered to be a function u : I^ ^ U that maps 
from a closed finite interval /„ C R into the control space U. Integrating such a 
control input over time via the ODE ([T]i gives rise to a path in state space X that is 
dependent on an initial state xq and time t. Such integrated paths can be written as 
functions (t>u{xo,t) : X x I^, ^ X with the properties 





Fig. 2 The car (rectangle) has a valid initial state and path depicted in bold. It follows that 
the translated and rotated initial states and paths are equally valid. 
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and 

Given two time dependent control inputs u : I,, ^ U and v : ly ^ U with /+ = I~ , 
u*v: [/j7 , /,t] -^ X is defined as the concatenation of the two functions u and v such 
that 

u{t), if te[i-,i+) 



' v[t), otherwise. 

This notation may also be used in cases where /+ ^ 7^7 . In these cases a suitable re- 
parameterisation of/,, is performed implicitly. Note that when using this notation for 
two integrated paths in state space X, the concatenation results in a single continuous 
path if and only if the final state of the first path coincides with the initial state of 
the second path. When this is the case, the resulting path is equivalent to integrating 
the concatenated control inputs directly, thus 



2.2 Trajectory Transformations 

It is hard to find a solution for the planning problem of connecting two predefined 
points Jtstait and .Tgoai in X |l3l[T0l. In the general case, this leads to running a search 
over all time varying control inputs. As the space of all possible control inputs can 
be too big to search exhaustively, many algorithms focus on relatively small subsets 
and either run a search over a discrete path set 16j or run a non-linear optimisation 
algorithm or search over a continuous path set QID. The former, by its very nature, 
can only reach a discrete subset of X, where as the latter typically involves reinte- 
grating the whole trajectory 0uixstaix,t) in each step of the optimisation process. 

Using operations given by a Lie group to transform a valid trajectory allows for 
the reuse of large parts of a previously calculated <&»(xstart,0 as long as changes to 
the trajectory happen in a compatible way. Thus searching a continuum can be done 
without complete reintegration. 

Let u and v be two time dependent control inputs that differ in some region, but 
coincide otherwise. They can be split up as 

M = Ml * M2 * M3 

and 

V = Ml * V2 * M3 

where mi and M3 represent the parts that are common to both. Note that the lengths 
of the middle segments /j,^ and /,,, do not necessarily have to be equal. Starting both 
trajectories at a common initial state xq G X yields 



0u{xo,t) ^ <l>vixo,t) for telui 
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Fig. 3 Three trajectories for a car (rectangle), all resulting from the same control Input. 
The behaviour Is sensitive to initial conditions (speed and turning rate), causing different 
trajectories. 

In general, equality of the third part of the control inputs, ii^, can not be used, as the 
final states of the middle segments, 0u{xq,I^) and 0u{xo,ll), need not coincide. 
Using different states as initial states for the third part of the path can result in a 
variety of different trajectories as illustrated in Fig. |3] If however it is assumed that 
the final states of the middle segments M2 and V2 only differ on the fibre component 
G but coincide on the base space Z, the similarity of the third parts of the trajectory 
can be exploited. Having equality on the base space as in 



;rz(^„(xo,/+))-;rz(0„(xo,/+)) 
implies there exists a transformation g 6 G such that 

0v(jfO,/+)=g^„(.Xo,/+). 



(2) 



(3) 



In the case of the kinematic car, Eq. (O can be interpreted as having identical speed 
V and turning rate (O. Then Eq. (O yields the translation and rotation necessary to 
transform one state into the other. Because the equations of motion are invariant 
under translation and rotation, the resulting third parts of the paths will be translated 
and rotated versions of each other as illustrated in Fig. |4l In the general case, the 
same line of reasoning on invariance yields 



0,.,{0,.{xo,I+),t) 



^«3(g^«(xo, /+),?) = ga>„3(^„(xo,/+),f) 



(4) 



Looking at this result from a viewpoint of computational complexity, Eq. Q can 
save calculation time. Given <&„, the computational cost of 0,, is mainly the cost 
of integrating the second segment given by V2- The third segment defined by M3 
can be calculated directly by the use of group operations. In particular, during non- 
linear optimisation, the final state 0y{xo,I^) is typically the only one of interest. 
Thus, there is no need to actually transform the whole third segment of the path. 
Instead one can determine the trajectory's final state directly. As a result, the cost 
for cl>y{xQ,I:^) is linear in the size of /y, . 
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Fig. 4 A car (rectangle) follows two different paths resulting from control inputs that only 
differ on a region in the middle but are identical otherwise. The paths coincide up to the first 
marker (dot). After that, the paths differ. However at the respective second markers, speed 
and turning rate are identical for both paths and thus the remaining parts of the path are the 
same, just transformed. 

2.3 Optimising a Trajectory 

Given a time dependent control input u and a corresponding trajectory <&„ {xo,t), one 
might be able to find an alteration u'^ that stretches (or compresses) the trajectory's 
behaviour on the base space Z over time. That is 

nz{^u^ixo,t)) = nz{<t>uixo,ct)) (5) 

for a stretch factor < c G R. In particular, this yields identical final states on Z, 

nzi<l>u'^ixo,I+)) = rcz(^„(xo,/+)) . 

In the case of the car, for instance, this could map to reduced accelerator commands 
resulting in a longer distance travelled by the time the target speed is reached. While 
the stretching operation does not change the end result on the base space Z, it does 
alter the fibre G, thus emphasising or weakening features of the trajectory. For the 
car, the stretching operation can be calculated by dividing the control inputs a and 
p by c while multiplying the time they are applied by c. 

Combining the results obtained so far, an efficient tool for altering a trajectory 
during a non-linear optimisation process can be built. Let ^„(xo,f) be a trajectory 
given by a split control input 

M = Ml * . . . * M„ 

and starting point xq £ X. Changing a single ui to Uj' =: v, results in a Lie group 
operation gi e G as of Eq. Q. Repeating this, one is able to alter several or even all 
segments of the path at once in order to get a new control input 



vi * . 



■ *v„ 



where all v, result from some mJ'. In cases where ct ~ 1, and thus the segment is 
unaltered, the corresponding g, is set to the identity element 1 G G without further 



44 



K. Seller, S.P.N. Singh, and H. Durrant-Whyte 



calculation. Assuming 0„{xo,t) is given and the changed segments 
^v,(^M(.^Oj^,r_, ),0 and transformations gi have been calculated, the new trajectory 
0v{xo,t) is computed efficiently using group operations only. Iteratively applying 
& yields 

and thus 

In particular, one can write the final state of the trajectory as 



0,.{xo,I+)^g„...gl0uixo,I+ 



(6) 



Clearly not much is saved in cases where all segments of the trajectory have been 
changed (i.e., all c, ^ 1). However, if only a small fraction of the control input has 
been altered, then it is only necessary to reintegrate the fibre component of those 
altered segments. Thus the computational cost for calculating the new trajectory, or 
directly its end point, is linear in the length of the changed segments plus the cost of 
a few additional group operations. 

Note that it is possible to perform the calculations of g,- and ^v, (^« {xo , I^._ . ) , ?) 
separately for each segment, independent of what is done to other segments. Thus, 
for another transformation using some c'-, all results where c] = c, can be reused. 
This speeds up things significantly for gradient calculations as will be detailed later 
and also allows for parallel computation. 




Fig. 5 A scaling operation has been applied to the bold segments of the left hand path to 
derive the right hand trajectory. Only the hold segments had to be reintegrated, the remainder 
Is Identical. 



3 Path Correction Algorithm without Obstacles 

For path correction, it will be assumed that an initial path <&„(xo,?) as well as its 
control input u and initial state xq have been given. Furthermore, the path's final 
state 0,, (xq , /+ ) does not coincide with the goal Xgoai, but is somewhat in the vicinity 
of it. The objective is to alter the trajectory ^„ in such a way that its final state 
matches Xgoai. It will be assumed that the correction needs to be done in the fibre 
component only and that there are no obstacles present. This will be achieved in two 
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Fig. 6 The segments (bold) chosen for the trajectory on the left are unable to span the space 
well as they enable moving the final state horizontally and vertically, but prohibit alteration 
to the car's heading. The selection shown on the right is superior because changes in all 
directions including heading are possible. 

steps: (1) a small and suitable set of path segments will be selected for stretching 
operations; (2) matching stretching factors c, will be determined for said segments. 
When selecting path segments, it is advantageous to select exactly as many seg- 
ments as there are dimensions in the Lie group G. Using less segments results in too 
few degrees of freedom when altering the trajectory and thus failure to span a whole 
neighbourhood of the final state f&„ (.»:o , /+ ) . Using more segments than dimG leads 
to undesired behaviour as the solution is no longer unique. Furthermore, segments 
are chosen in such a way that the directions they move the trajectory's final state 
into have the potential to span the space well as illustrated in Fig. |6l This can be 
formalised by considering the derivatives 



evaluated at c, = 1 . As above, v represents the control input u with some segments 
Ui replaced by their scaled versions uf and, again, g,- G G denotes the resulting Lie 
group transformation. The quality of a selection of dimG segments can then be 
measured by analysing the condition of the resulting Jacobian 



/: 



d0,{xo,I+) 

^(ci,---,CdimG) 



dgi0u{xo,lu) dgditnG^„{xo,Iu 



iCi 



dcdi 



imG 



(7) 



evaluated at c, = 1 for all /. The derivatives in the matrix on the right hand side are 
written as column vectors. If the matrix's condition is small, it has the potential to 
span the space well. 

Since each column of / in Eq. (O is independent of the remaining segments, the 
derivative has to be calculated only once. Thus, in practise, a solution is to select a 
larger set of non-overlapping segments and out of that then randomly draw selec- 
tions of dimG elements for further testing. The selection with the smallest condi- 
tion of the resulting Jacobian is then chosen. An exhaustive search for the optimal 
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selection is not necessary since it is sufficient to remove poor candidates. Taking a 
few random samples is often enough. 

As optimisation algorithms typically work by minimising a target function |[2l, 
here the distance of the path's final state to the goal, it might seem tempting not to 
use the condition of the final state's Jacobian as presented here, but instead estimate 
the convergence rate of that target function directly via its second order approxi- 
mation and the eigenvalues of the Hessian lfT6l l4l. In tests however this proved to 
perform poorly. 

Once a set of segments is chosen, the values for the c, need to be determined in 
order to actually improve the trajectory. Therefore a target function /(ci , . . . , Cdimc) 
is defined as the distance between 0v(xo,/+) and x<,oai- It is then minimised using a 
Conjugate Gradient method lfT6l [2ll. Estimating the gradient of / at (cj, . . . ,CdimG) 
is done by taking into account the function value /(ci , . . . , Cdimo); as well as those 
resulting from going a small step into each direction, /(ci, . . . ,c,- + e, . . . ,CdimG)> 
naively resulting in dimG+ 1 integrations for each segment. However, since only 
two distinct values, c, and c, + e, are used for each dimension of G, the calculated 
gi can be recombined to obtain all function evaluations necessary. Thus, the cost to 
estimate a gradient is two integrations per segment plus some group operations. 

Pseudocode for this algorithm is presented in Algorithm [T] It was used for the 
path depicted on the right hand side of Fig. [T] as well as the example presented in 
Fig. |7] Implementing this algorithm for more complex 3D cases, such as bevel tip 
needle steering, the state space X consists of eight dimensions: Insertion speed v, 
turning rate (O as well as six degrees of freedom representing position and orien- 
tation in three space. Thus the base space Z represents v and (O whereas G equals 
the group of Euler transformations SE(3). Following previous notation in this do- 
main lfT5l[T7i . SE(3) is represented using homogeneous 4x4 matrices g = {q\) 
where R G S0(3) is a rotation matrix and t S M^ represents translation. The control 



Algorithm 1. Path correction algorithm without obstacles 
u <— current plan 
M <^ select set of at least dimG non overlapping segments of /„ 

Cmin ^ ~ 

for / = 1 to mmimaxSelections, number of selections possible) do 
S ^- draw new selection of dim G random elements of M 



J ^ ^ 3g>'!>„M) ^ ^ dg.^r^o'PA^^oj;^] j {calculated for the segments stored in S} 
if cond(/) < Cmin then 
Cmin ^ cond(y) 

end if 
end for 

optimise ci,...,CdimG 

V ^- scale the segments of u stored in 5min according to values of ci , . . . , CdimG 
until dist {xgo^\,0v{xo,I^)) minimal 
return v 
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Fig. 7 A car (rectangle) is trying to reach tiie goal (dot). The dashed line shows the initial 
path that fails to reach the goal. By altering the segments depicted in bold, the solid path is 
created. 




Fig. 8 Path correction for a needle steering case. The needle needs to reach the goal (dot), but 
the initial plan, depicted by the dashed line, misses it. The solid line is the correction made 
by the path correction algorithm. 



space U has two dimensions, acceleration a = v and change in turning rate p = 6). 
The remaining equations of motion are given by 
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where the constant K is the curvature of the needle's trajectory and se(3) is the Lie 
algebra of SE(3). An example for path correction using this system is presented in 
Fig.l 
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Since the algorithm works by enlarging or shrinking certain sections of the tra- 
jectory, it can not perform well in cases where the trajectory has too few features. 
Especially in cases where the path consists only of a straight line or a section of a 
circle, it is impossible to find segments that span the space well in a way discussed 
previously and illustrated in Fig.[6l 

4 Path Correction Algorithm with Obstacles 

When dealing with obstacles, an inversion of perspective is helpful. Up to now the 
initial path was considered to start at the robot's current state and the final state was 
then optimised. However, it is equally valid to anchor the initial path at the goal 
state Xgoai and consider the robot to be at a state Xcun- that does not coincide with the 
path's initial state Xstart == ^« (-^^start , C ) • 

It will be assumed that the initial path is a collision free trajectory (P,, (xstart, with 
't'u{xstm,lu) = -^goai- -^obs does not have to be empty, but it is assumed that there is 



Algorithm 2. Path correction algorithm with obstacles 
u ^- original plan 
V ^- current plan 
Xcun- ^ system's current state 
S ^- empty stack 
repeat 

if <Pv(-VcuiT, in collision then 

vi , V2 ^- V split at point of first collision 
else 

Vi,V2 ^v,0 
end if 

Jfnew *— find intermediate goal using u 

I'l ^- run algorithm without obstacles (Alg.[Tll for ©y, (xcun-jf ) in order to reach x^ 

if 4>,., (xcun-jf) collision free then 

S.push ^ Vi,Xcurr 

V,Xcurr ^ V2, 4>v,, (xcunj^,) 

else if $v, (xcurr, J',^) collision free then 

V ^ Vi * V2 

else if S not empty then 

VO,XcuiT ^ S.pop 

V ^ vq * Vi * V2 

else 

return FAIL 

end if 
until V = 
while 5 not empty do 

VO,Xcun^S.pop 

V ^ V() * V 

end while 
return v 
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Fig. 9 A car tracking a path (thin solid line) through terrain with obstacles to reach the 
goal (dot). While driving, base space and control inputs (v, ft), a and p) are disturbed by 
random errors in form of a Wiener process. Errors on the base space are corrected by use 
of a feedforward controller with a saturation function, while resulting errors in position and 
heading are corrected repeatedly using the path correcting algorithm. The actual path taken is 
depicted by the bold solid line and the currently planned path that is to be followed is shown 
by the dashed line. 



a certain amount of free space surrounding the trajectory most of the time that can 
be used for corrective actions. The robot is currently at Xcurr and is following a path 
defined by v that is derived from, but that might not coincide with u. So ^v(-^curr,0 
does not necessarily reach the goal Xgoai- Allowing a discrepancy between u and v is 
advantageous when making multiple corrections; for example, when repeated online 
calculations are performed while executing a path under disturbances. In this setting, 
the initial path u will be kept constant during the whole process, while alterations 
are made to v only. 

If no collisions occur in <l>v{xcuTr,t), alterations to v can be made directly using 
ths path correction algorithm without obstacles (Algorithm [Hi. Otherwise, in case 
of collisions, the first point of collision is found as 

fcoi := min{/ e /,, | ^v(xcurr,f) e ^obs} 

and the path can be split such that v = vi * V2 and fcoi — Av '^° 8^'' around the 
obstacle, it is necessary to correct the path in two steps, first 0^,^ {xcun,t) using a 
new intermediate goal Xnew & ^free and then V2- 

To define .Xnew, the colliding point 0^^ (j^curr, A^^ ) i** pulled towards the correspond- 
ing point .^Torig of 4>„(xstart,f) where, due to scaling operations performed on v, the 



50 



K. Seller, S.P.N. Singh, and H. Durrant-Whyte 




Fig. 10 A needle has to reach a goal (dot) and has an Initial path depicted by the dashed 
line. Due to an offset In the Initial position, the planned trajectory would result in the dotted 
line, colliding with an obstacle and missing the goal. The path Is corrected and a valid path 
depicted by the solid line Is created. 



time t is not necessarily identical for u and v any more. This is done by parameter- 
ising a straight line connecting .Torig with Oyj (.^curr,A^) by 5 : [0, 1] ^ X such that 
5(0) =JCorig andi'(l) = 0,,, (xcurr, ^1^^)13 Using i', the intermediate goal is defined as 

^new -^^ SyCC ) , 

where n is the smallest « G Nsuchthati'(a") GX^ee and a G [0, 1). The convergence 
rate a determines how fast the trajectory should be pulled back towards the original 
path and away from the obstacle. A large a results in staying closer to the obstacle 



^ Note that, as states already coincide on the base, the line s only has to be defined In G 
and is constant In Z. In most cases, it is intuitive what a suitable choice for a straight 
line within G should be and how it can be implemented easily. In less obvious cases, 
the exponential map exp : g ^ G can be used, where g is the Lie algebra of G. Let d = 
nG{xong)~ i^ci^vii^cunj^.)) be the difference between the two states to connect. The 
points have to be close enough such that d lies within the identity component of G (i.e. 
the image of exp), as otherwise an easy connection is not possible. Then the line can be 
defined as s{t) := ;rz(.i:ori„) exp(/exp^' (d)) for a suitable pre-image exp^' (d). 
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and thus in a higher change that future corrections are necessary; a smaller a on 
the other hand pulls the trajectory more aggressively towards the original trajectory, 
preventing effective use of the available free space. The presented implementation 
uses a = 1/2- 

Using Xnewi the path correction algorithm without obstacles is run over vi to get 
a new plan w\ and thus w ^w\ *V2- Note that it is acceptable if the new trajectory 
does not actually reach jfnew itself as the main purpose of the operation is to pull 
the path away from the obstacle. What has to be considered tough are collisions of 
^wi(-^curr,0- l^f ^vvi (-fcurriO IS Collision free, the process continues by recursively 
applying the path correction algorithm with obstacles on V2. If ^vvi(jccuit,0 is in 
collision, two cases have to be considered. If the final state ^wi (-^curriAti) ^^ ^^ 
collision, the optimisation run did not get close enough to jtnew because the path 
given by vi was too short or too featureless for the algorithm to perform well. In 
cases where vi is not the first part of the path due to a recursive call, w (and thus vi) 
can be extended and a new attempt can be made. Otherwise the system is too close 
to an obstacle for suitable correction, and the algorithm is considered failed. If the 
final state 't>w,{xcun:,tw,) is in Xfiee, the optimisation run was successful but the 
alteration introduced a new collision. Then a recursive call on the altered plan w 
and the starting point Xcun- is necessary to get rid of the newly introduced collision. 
Pseudocode for the complete framework is given in Algorithm[2l 

The algorithm can be used either offline within another planning framework (e.g. 
PRM, RRT) |4 111 or online while tracking a previously planned path. When applied 
in the latter approach, it is important that the initial trajectory u is kept unaltered 
during the whole process. Examples of how the algorithm performs are presented in 
Figs.|9]and[Il 

5 Conclusion 

An algorithmic framework was presented that allows for elegant and fast path cor- 
rection while preserving the character of the initial trajectory, thus eliminating the 
need for expensive re-planning from scratch. The algorithm has been implemented 
for a kinematic car as well as for needle steering and simulations for the system's 
behaviour under disturbances have been performed. Future work will include im- 
plementing the system on experimental field systems currently under development. 
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Asynchronous Distributed Motion Planning with 
Safety Guarantees under Second-Order Dynamics 

Devin K. Grady, Kostas E. Bekris, and Lydia E. Kavraki 



Abstract. As robots become more versatile, they are increasingly found to oper- 
ate together in the same environment where they must coordinate their motion in 
a distributed manner. Such operation does not present problems if the motion is 
quasi-static and collisions can be easily avoided. However, when the robots follow 
second-order dynamics, the problem becomes challenging even for a known envi- 
ronment. The setup in this work considers that each robot replans its own trajec- 
tory for the next replanning cycle. The planning process must guarantee the robot's 
safety by ensuring collision-free paths for the considered period and by not bringing 
the robot to states where collisions cannot be avoided in the future. This problem 
can be addressed through communication among the robots, but it becomes compli- 
cated when the replanning cycles of the different robots are not synchronized and 
the robots make planning decisions at different time instants. This paper shows how 
to guarantee the safe operation of multiple communicating second-order vehicles, 
whose replanning cycles do not coincide, through an asynchronous, distributed mo- 
tion planning framework. The method is evaluated through simulations, where each 
robot is simulated on a different processor and communicates with its neighbors 
through message passing. The simulations confirm that the approach provides safety 
in scenarios with up to 48 robots with second-order dynamics in environments with 
obstacles, where collisions occur often without a safety framework. 

1 Introduction 

This paper considers multiple autonomous robots with non-trivial dynamics oper- 
ating in a static environment. The robots try to reach their individual goals without 
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collisions. Such scenarios are becoming increasingly interesting. For instance, 
consider the case of vehicles moving in a parking lot or going through a busy inter- 
section, or unmanned aerial vehicles that carry out complex maneuvers. These ex- 
amples involve second-order systems, which cannot stop instantaneously and must 
respect limits in the second-order derivatives of their state parameters. For such 
systems, collisions with other robots or obstacles cannot be easily avoided. 

Real applications also require the solution of such problems in a decentralized 
manner. This work imposes a requirement for a decentralized solution and considers 
robots that replan their trajectories on the fly. Replanning allows robots to consider 
multiple alternative trajectories during each cycle and provides flexibility in chang- 
ing environments. To coordinate the robots, this work utilizes communication. A 
planning algorithm makes use of information collected through communication to 
avoid collisions for the next cycle and ensure that robots reach states from where 
collisions can be avoided in the future. The duration of the cycle is the same for 
all robots, but the robots are not synchronized. Hence communication of plans can 
happen at any point and the robots need to operate safely in the presence of par- 
tial information about the plans of their neighbors. An asynchronous, distributed 
framework is developed that guarantees the safety of all robots in this setup. 

Background. Safety issues for dynamical systems were first studied many years 
ago. Collision-free states that inevitably lead to collisions have been referred as 
Obstacle Shadows Ii24j . Regions of Inevitable Collision Ii20i or Inevitable Collision 
States (ICS) IfTSll . A study on ICS resulted in conservative approximations lfT3]| and 
generic ICS checkers |J2T|. It also provided 3 criteria for motion safety: a robot must 
(i) consider its dynamics, (ii) the environment's future behavior, and (iii) reason 
over infinite-time horizon lfT2l . This line of research, however, did not deal with 
coordinating robots as the current paper does. 

Reactive methods, such as the Dynamic Window Approach [111 and Velocity 
Obstacles ifTOl . can enable a robot to avoid collisions for unknown on dynamic en- 
vironments. Many existing reactive planners, however, do not satisfy the criteria 
for motion safety 111211211 . Path deformation techniques compute a flexible path, 
adapted on the fly to avoid moving obstacles IIT8ll27l . but do not deal with ICS. 
Reciprocal Velocity Obstacles (RVOs) [4] involve multiple agents which simulta- 
neously avoid one another without communication but do not deal yet with ICS. 
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Fig. 1 A sample run in the office environment (left to right). Links show communicating 
robots. 
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A related control-based approach IfTTl deals with second-order models of a planar 
unicycle but does not provide guarantees in the presence of obstacles. 

In contrast to reactive approaches, this paper focuses on planning safe paths. 
Planning has a longer horizon so it does not get stuck in minima as easily and 
extends to high degrees-of-freedom systems. Reasoning about safety during plan- 
ning focuses the search on the safe part of the state space. In this work planning 
is achieved using a sampling -based tree planner ll20l[T5l l2l. Alternatives include, 
among others, navigation functions [8| and lattice-based approaches Il23l . 

Braking maneuvers have been shown sufficient in providing safety in static envi- 
ronments [26 1 and have been combined with sampling-based replanning |l5j[2l. For 
dynamic environments, relaxations of ICS are typically considered, such as r-safety 
IIT4II . This notion guarantees no collision for t seconds in the future for each node of 
a sampling-based tree. A sampling-based planner was tested on air-cushioned robots 
moving in dynamic environments, where an escape maneuver was computed when 
the planner failed to find a solution ITSJI . Learning-based approximations of ICS 
can also be found 1X6)1 . as well as approximations of statextime space obstacles O. 
Other works focus on the interaction between planning and sensing, and point out 
that planning must be limited within the robot's visibility region Ifn i25]| . The current 
paper extends the authors' earlier work |[3|, which integrated a sampling-based plan- 
ner with ICS avoidance [2 1 to safely plan for multiple robots that formed a network 
and explored an unknown workspace. The previous work required a synchronous 
planning operation, which simplified coordination. 

Planning for dynamic networks of robots has been approached by a combination 
of centralized and decoupled planning Q, without considering, however, the ICS 
challenge. Centralized planning does not scale and decoupled approaches, which 
may involve prioritization |9j or velocity tuning Il22l . are incomplete. The existing 
work follows a decoupled approach for performance purposes. In contrast to veloc- 
ity tuning, it weakly constraints the robots' motion before considering interactions 
since it allows multiple alternative paths for each robot at each cycle. At the same 
time, it does not impose priorities but instead robots respect their neighbors in a way 
that emerges naturally from their asynchronous operation. 

Contributions. This work extends the range of problems that can be solved effi- 
ciently with guarantees for ICS avoidance. The paper presents a general framework 
for independent but communicating second-order robots to reach their destinations 
in an otherwise known environment. The framework is fully distributed and relies on 
asynchronous interaction among the robots, where the robots' replanning cycles are 
not synchronized, the robots have no knowledge about their clock differences and no 
access to a global clock. It is based on the exchange of contingency plans between 
neighboring robots that are guaranteed to be collision-free. While contingency plans 
have been used in the past, this work emphasizes the importance of communicating 
contingencies in multi-robot scenarios and studies the asynchronous case. A proof 
that the proposed scheme guarantees ICS avoidance is provided. The framework 
has been implemented on a distributed simulator, where each robot is assigned to a 
different processor and message passing is used to convey plans. The experiments 
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consider various scenarios involving 2 to 48 robots and demonstrate that safety is 
indeed achieved in scenarios where collisions are frequent if the ICS issue is 
ignored. The experiments also evaluate the efficiency and the scalability of the 
approach. 

2 Problem Statement 

Consider robots operating in the same known workspace with static obstacles. Each 
robot R' exhibits drift and must satisfy non-holonomic constraints expressed by dif- 
ferential equations of the form: x' - f'(x', u'), g'{x',x') < 0, where x' G X' represents 
a state, u' is a control and f',g' are smooth. The subset of the state space X' that 
does not cause a collision with static obstacles is denoted as X'r. The robot model 
used in this paper can be found in Section [5] and involves acceleration controlled 
car-like systems, including versions with minimum positive velocity. 

Each R' is located at an initial state x'{0) and must compute plans that will bring 
it to its individual goal x'(t„,a.\) without collisions and within finite time t„,ax- Then: 

• A plan is a sequence of controls p{dt) - {{u\,dti),..., (u„,dtn)} (dt - 2; cff,). 

• A plan p(dt) executed at state x{t) defines a trajectory: n{x(t),p{dt)), which is a 
sequence of states. 

• A trajectory is feasible as long as it satisfies functions /' and g' for robot R'. 

• A plan pidt) is valid at state x(t), if it defines a feasible trajectory n(x{t), p(dt)). 

• A state along n(x{t), p{dt)) at time t G [f : r H- cff] is denoted as x[n(x(t), p(dt))](t ). 

• A feasible trajectory n{x{t),p{dt)) is collision-free with respect to the static ob- 
stacles if: y t &[t:t + dt]: x[n(x(t),p(dt))](t) € Xf. 

• For a trajectory concatenation (figure below) n {n{x(t),p{dt)),p (dt )), plan 
p{dt) is executed at x{t) and then p (dt ) is executed at state: x[n(x(t), p(dt))](t+dt). 

• Two trajectories for/?' andR^ are compatible: jT'(x'(t'),p(dt'))^jT-'(x'(tJ),p'(dt^)) 

as long as: .... 

x[jT'](t) ^ x[jT-'](t) V t G[maxit\t-'):minit' + dt',t-'+dt-')] 

where x' x x' means that R' in state x" does not collide with R^ at state x-' . The 

corresponding plans p(dt'), p{dt^) are also called compatible at states x'(/'), x'(t-'). 

The robots are equipped with an omnidirectional, 

range-limited communication tool, which is reliable . .. 

^ trajectory concatenation: 

and used for coordination and collision avoidance. jt'(7t(x(t), p(dt)), p'(dt')) 

The robots within range of R' define the neighbor- ^lu ^^jHi>__ -----;*(x(t).p<dt))i(t+di) 

hood N'. A robot has information about other robots 

only if they communicate. 

Given the above notation, the problem of distributed motion planning with 
dynamics (DMPD) can be defined as follows: Consider m robots with range-limited 
communication capabilities operating in the same workspace with obstacles. Each 
robot's motion is governed by second-order dynamics specified by /' and g'. Ini- 
tially, robot R' is located at state x'{0), where x'(0) G X'r and ViJ : x'(0) x x'(0). 
Each R' must compute a valid plan p'{t,„ax) so that: 
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• .T[y(x'(0),/y(?„,„.v))](Wx) = x'^itmax) (i.e., the plans bring the robots to 
their individual goals within time tmax), 

• V /, Vr G [0 : f,„ax] : x[y(x'(0),/?'(fmajc))](0 ^ -^^ (i-e-. the resulting trajectories 
are collision-free with static obstacles) 

• and V i,j : n'{x'{Q),p'{t,„ax)) - ^KxHO),pKt,„ax)) O-e-, the trajectories are 
pairwise compatible from the beginning and until all the robots reach their goals). 

3 A Simple Framework without Safety Guarantees 

This paper adopts a decentralized framework for scalability purposes. Each robot's 
operation is broken into intervals {[f^ : fj],[fj,?2]' •••'[4 ■ ^,', i]'---)j called cycles. 
During [f^ j : /J,], robot R' considers different plans 77' for cycle [fj, : t'^ j ], given the 
future initial state x'(f,',). Through coordination, R' selects plan p' ([fj, : f ^ J). 

It is assumed that the duration of each cycle is constant and the same for all 
robots: V/, Vm : f^ j - fj, - dt. Nevertheless, the robots do not have a synchronous 
operation: the cycles among different robots do not coincide and rj, is typically dif- 
ferent than Iq. Synchronicity is a restrictive assumption, as it requires all the robots 
to initiate their operation at exactly the same time although they may be located in 
different parts of the world and may not communicate their initial states. 

Given this setup, Algorithm l3. ll outlines a straightforward approach for the single 
cycle operation of each robot that tries to find compatible plans. During [r^ j : rj,], R' 
computes alternative partial plans IT for the consecutive planning cycle. In parallel, 
7?' listens for messages from robots in neighborhood A^'. The messages contain the 
selected trajectories for each robot. When time approaches fj, - e, R' selects among 
all trajectories that are collision-free and compatible with the neighbors' messages, 
the one that brings the robot closer to its goal. If such a trajectory is indeed found at 
each iteration, then the DMPD problem is eventually solved by this algorithm. 



Algorithm 3.1. Simple but Unsafe Operation oiR' During Cycle [f^ ^ : r,',] 

n' <- and /7^' <- 
while f < fj, - e do 

n'{x'{tl,),p'(tl, : t' .)) «— collision-free trajectory from a single-robot planner 

/7'V/7'' Un\x\f„),p\f„:f^^^)) 

it RJ 6 A'' is transmitting a trajectory n^ then 77"' «— IJ'^' U k^ 
for all n' 6 /7' do 

for all nJ 6 /t'^' do 

if :?r' :?fc n^ (incompatible trajectories) then 77' <^ IT' -n' 
n\ «— trajectory in 77' which brings R' closer to the goal given a metric 
Transmit n'^ to all neighbors in A'' and execute n\ during next cycle 
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4 A Safe Solution to Distributed Motion Planning with 
Dynamics 

A robot following the above approach might fail to find a trajectory n' . This section 
describes a distributed algorithm that guarantees the existence of a collision-free, 
compatible trajectory for all robots at every cycle. 

A. Safety Considerations - Inevitable Collision States: One reason for failure 
is when the single-robot planner fails to find collision-free paths. This is guaranteed 
to happen when x'{t'^ is an ICS. State x{t) is ICS with regards to static obstacles if: 
V p{oo) : 3dtG [t,co) so that x[n(x(t), p(co))] i Xf. 

Computing whether a state is ICS is intractable, since it requires reasoning over 
an infinite horizon for all possible plans. It is sufficient, however, to consider conser- 
vative methods that identify states that are not ICS lfT3l l2l. The approximation rea- 
sons over a subset of predefined maneuvers r(oo), called here contingency plans. 
If/?' can avoid collisions in the future with static obstacles at x'(r„) by guarantee- 
ing that a contingency plan y'ipo)^ r'(oo) avoids collisions over an infinite horizon, 
then x'{tn) is not ICS with regards to static obstacles. For cars, braking maneuvers 
are sufficient since it is possible to reason over an infinite time horizon whether these 
plans will collide with static obstacles. Circling maneuvers can be used for systems 
with minimum velocity limits, such as airplanes. 

Multiple moving robots pose new challenges for ICS. Trajectories n' and n-' 
may be compatible for the next cycle, but the corresponding robots may reach 
states that will inevitably lead them in a future collision. Thus, safety notions 
have to be extended into the multi-robot case. It is still necessary for computa- 
tional reasons to be conservative and focus only on a set of contingency plans. 
For m robots {R\R^, . . .,R'"} executing plans {p^{dt^),p^{dt^), . . .,p"'{dt"')} at states 
{x^ (t), x^{t), ..., x'"{t)}, state x'{t) is considered a safe state if: 

3 7'(oo) G r'(co) so that V t €[t,oo): x[n'ix'it),yioo))](t ) g Xf and 
Vye [l,m], 7^ (, 3 7^(co) G r^(co) : 7T\x'(t),y(oo))^7T'(7T'(x'(t),pHdt')), y'i^))- 

In the above definition, dt' is the remaining of robot /?''s cycle past time /. Note that 
a trajectory concatenation is used forR^'s trajectory. In this trajectory concatenation, 
p^{dt^) is executed for time df and then the contingency 7^(00) is applied. The 
reason is that as robots decide asynchronously, it may happen that at t, robot R^ 
has already committed to plan p^{dt^). Extending the assumption in the problem 
statement about compatible starting states, the following discussion will assume 
that the initial states of all the robots are safe states. Then an algorithm for the DMPD 
problem must maintain the following invariant for each robot and planning cycle: 
Safety Invariant: The selected trajectory ji' (x'(t'^),p'(t'„ : t'^ j)): 

a) Must be collision-free with obstacles. 

b) Must be compatible with all other robots, during the cycle (fj, : t'^ j): 
tt' (x'(4), p'iti : r; J )) X n\x}(t^„), pKi : < 1 )), Vj ^ /. 
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c) The resulting state x[n'](f^ j) is safe for all possible future plans p-'it'^ j : oo) 
selected by other robots (j + i). In other words, the concatenation of tt' with y'ioo) 
must be compatible with the concatenations of other vehicles, i.e., V7 ^t /: 
7r'(7r'(x'(r;,),/7'(r;,:f; i)),^^^))^ nJ(n\xJ(i),pJ(i:f^ ,)),7^(oo)). 

Point c) above means that R' has a contingency plan at x[n' ](t'^ j), which can be 
safely followed for the other robots' choices given the algorithm. If the invariant 
holds for all the robots, then they will always be safe. If for any reason a robot can- 
not find a plan that satisfies these requirements, then it can revert to its contingency 
that guarantees its safety. 



Algorithm 4.1. Safe and Asynchronous Operation of R' During Cycle [f . : fj,] 



1 

2: 
3: 

4: 
5: 
6: 
7: 
8: 
9 

10: 
11 
12: 
13: 
14: 
15 
16 
17: 
18: 
19 
20: 
21 
22: 



11 -^ VI, 11 p,-g-i' -^^ VI, 11 new ^^ VI 

for all RJ 6 A'' do 

(i.e., include all past trajectories and attached contingencies of neighbors) 
while t < t'„ - e do 

n'(x'(t'^),p'(t'i^ '■ t' ,)) -I— collision-free trajectory from a single-robot planner 
TTiy <— jt'( n'{x'{t'^),p\t'^ : f|,^[)), 7(f',+ i '■ °°)) (i-S-> contingency concatenation) 
if V f 6 [t'^^^ : 00) : xWy]{t) 6 Xf then 
/7' <- W U n^y 
for all niy 6 77^'^^ do 
if n'y t Jiy then 
/7'V/7'-4 
if RJ 6 N' is transmitting a trajectory and an attached contingency then 

for all n'y 6 /7' do 
for all 4 e /T^",,, do 
if .T^ :?t TTy then 
/7'V/7'-;r^ 
if /7' empty or if a message was received during compatibility check then 

n'f «— 7T'(x'(t'„),y(t'n : 00)) (i.e., follow the available contingency for next cycle) 
else 

n'f «— trajectory in IT' which brings R' closer to the goal given a metric 
Transmit n'^ to all neighbors in A'' and execute tt, during next cycle 



B. Safe and Asynchronous Distributed Solution: Algorithm 14. II in contrast to 
Algorithm 13. II maintains the safety invariant. The protocol follows the same high- 
level framework and still allows a variety of planning techniques to be used for pro- 
ducing trajectories. The differences with the original algorithm can be summarized 
as follows: 
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• The algorithm stores the messages received from neighbors during the previous 
cycle in the set /7^.^,, (lines 1-3). Note that the robots transmit the selected trajec- 
tory together with the corresponding contingency (lines 12-13 and 22). 

• A contingency plan y(t'^^ j : oo) is attached to every collision-free trajectory 
n'(x'{t'^),p'(tlj : f^ j)) and the trajectory concatenation n' is generated (line 5-6). 
Note that potentially multiple different contingencies can be attached to the tra- 
jectory n'{x'{f^),p'{tl^ : f^ j)). Each resulting trajectory concatenation is treated 
individually by the algorithm. 

• The trajectory n' is added to TJ' only if it is collision- free with static obstacles for 
an infinite time horizon (lines 7-8), thus guaranteeing that x[n'](t'^ j) is not ICS. 

• n' is rejected, however, if it is not compatible with all the trajectories and contin- 
gencies of neighbors from the compatibility check (lines 14-17). R' checks not just 
trajectories for the next cycle but its trajectory concatenations with contingencies 
n' against its neighbors' trajectory concatenations n^ . 

• The final change (lines 18-21) addresses the possibility that TJ' is empty or when 
a message arrives while /?' executes its compatibility check. If any of the two is 
true, then R' selects to follow the contingency -y(/,', : oo), which was used in the 
previous cycle to prove that x(fjj) was safe. Otherwise, R' selects among the set 
77' the trajectory that brings it closer to the goal according to a desired metric, 
previous cycle, stored in 77^^,, (lines 9-11). 

• The while loop (lines 4-13) is executed as long as time t is less than the end of the 
planning cycle (fj,) minus an e time period. Time e should be sufficient for the robot 
to complete the compatibility check (lines 14-17) and the selection process (lines 
18-22). If the robot is running out of time, the robot should immediately select a 
contingency in order to guarantee safety. In a real robot implementation, this can 
be achieved through an interrupt or a signal that stops execution and enforces the 
contingency. In a serial implementation e has to be sufficiently large. 

Overall, each robot selects a plan /?'(/,', '■ t'^ i) and a contingency y'it'^ j : oo) that 
respect the plans and contingencies of other robots that have been selected before 
time t'lj. If no such plan is found or there is no time to check against newly incoming 
messages, then the contingency y{t'„ : oo) is selected. 

Computational Complexity: The algorithm's complexity depends on the num- 
ber of neighbors A^', which in the worst case is the total number of robots A^. In 
order to evaluate the cost of operations involving trajectories, it is important to con- 
sider a trajectory representation. A discrete sequence of states can be sampled along 
a trajectory, given a predefined resolution in time Q (i.e., the technique becomes 
resolution-safe in this case). Then, let S be the upper limit in the number of states 
used to represent each trajectory conceternation. P denotes the upper limit in the 
number of plans considered during each planning cycle for the current agent. 

Given the above notation, the complexity of the algorithm's various operations is 
as follows: (a) Lines 2-3 : SxN, (b) Lines 7-8: PxS,{c) Lines 9-11: PxNxS^ 
(if the states in a trajectory are not accompanied by a global timestamp) or PxNxS 
(if the states are tagged with a global timestamp), (d) Lines 14-17: Same as above. 
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(e) Lines 20-21: P, assuming constant time for computing a cost-to-go metric for 
each state, (f) Line 22: NxS. 

Overall, the worst-case complexity is: PxNxS^. Note that for robots with lim- 
ited communication, the parameter A^ is reduced. Furthermore, coarser resolution 
in the representation of trajectories improves efficiency but introduces the probabil- 
ity of collision due to resolution issues. Similarly, considering fewer plans reduces 
computational complexity but reduces the diversity of solutions considered at each 
time step. Finally, lower maximum velocity or higher maximum deacceleration also 
assist computationally in the case of braking maneuvers. 

C. Guaranteeing Maintenance of tiie Safety Invariant: This section provides 
a proof that Algorithm 14. 1 1 maintains the safety invariant given some simplyfying 
assymptions that will be waived later. 



Theorem 1: Algorithm 14. II guarantees the maintenance of the safety invariant in 

every planning cycle given it holds during the cycle {tl : t'^) and that: 

i) all robots can communicate one with another, 

ii) plans are transmitted instantaneously between robots. 

Proof: The proof is obtained by 
induction. The base case holds for 
I K tn+i tn+2 tn+3 R' becausc of the Theorem's as- 

i i i sumption that the Invariant holds 

i : i : i i during cycle (fi : fj). The induc- 

*n ' tn+i ' tn+2 ' tive Step wiU show that if the In- 



-*- 



variant holds during the cycle (?J, : 

Fig. 2 The replanning cycles of two neighboring ^» 1^ ^^"^ " ^^^^ ^^^° ^°^^ '^"'"" 
robots r' and RJ . The times denote transitions be- ing the cycle (t\^ ^'.f^ ^) for Algo- 
tween planning cycles for each robot. The vertical rithm 14.11 Without loss of gener- 
arrows denote the transmission of information, e.g., ality consider Figure |2] and focus 
at 4, R' transmits n'{ n\x'(i),p'(f„ : f^^^)), y{f^^^ : on robot R' . To prove the induc- 
°°) ^- tive step, it is necessary to show 

that each one of the three points 
of the Invariant will be satisfied during (f , : f ,). For cycle (f , : f ^) there are 
two cases: (1) A compatible trajectory n' - n' G IT is selected, or (2) the current 
contingency is returned. 
Case 1: A trajectory n' G /7' is selected. 

a) Trajectory n' has to be collision-free as part of IT'. 

b) Assuming instantaneous plan transmission and by time /'^ ^ , R' has been sent and 
has available the choices of other robots for cycles that start before t'^ j . Since n' G 
77' is selected, none of these messages arrived during the compatibility check. This 
means that 7?-''s trajectory 7r-'( ;r-'(x'(/-' ,),p^{t' . : t^ 2-*-*'^*-'^ ,■ °°) ) is available 
to R' during the compatibility check. Then the cycle (f|^ j : f|^ 2) can be broken into 
two parts: 
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i) During part (f^ j : tj^ j)' the selected plan p'{f^ j : f^ j) is compatible with 

pHt^„ I '■ t^„ 2) because the second plan was known to R' when selecting n' . 
ii) For part (t' ~:f -) there are two cases for W at time f' ,: 

• W will either select a plan /'-'(?■', 2 ■ 'il^ 3) that is compatible with /7'(/'j j : f^ j)' 

• or it will resort to a contingency y^it' , : 00), which, however, is already com- 
patible with trajectory n' . 

In both cases, /?-' will follow a plan that is compatible with p'it'^^ j : ?J, ■^■ 
Thus, the second point b) of the Invariant is also satisfied for robots R' and RL 
c) For the third point of the Invariant, the contingency y^^,', 2 ■ °°^ ^^^ ^o be com- 
patible with the future choices of the other robots. Focus again on the interaction 
between R^ and /?-'. There are again two cases for /?-' at time /■' ^ : 

i) /?-' will select a plan /7-'(f';^ 2'^/, 3) and a corresponding contingency ^■'(f';^ 3 : oo)- 
This plan and contingency respect by construction /?"s contingency y'if^ 2 ■ °°^' 
since it was known to R^ at time t^ , . 

n 2 

ii) Or R^ will resort to its contingency y-'C?^ , • °°)' which, however, the contin- 
gency y'{t'^ 2''°°^ respected upon its selection. 
In any case, whatever R^ chooses at time f^ 2' i' i^ going to follow plans in the 
future that are compatible with y'if^ 2 ' °°)- Thus, point c) is also satisfied. 

Case 2: A contingency y'{t' . : 00) was selected. 

The inductive hypothesis implies that x'{t'^^ j) is a safe state. Thus: 

a) y'(t' ,: t' t) is collision-free with static obstacles 

^ ^ ^ n I n 2' 

b) The current plans of all robots will be compatible with 7'(f^ j : r^ 2)' which 
was known to them at time fj,. Furthermore, y\t'^ ^ : t'^ .,) already respects the 
contingencies of other robots that might be executed before /'^ j . 

c) The state x' [7' (f^ 1 : °°)](fj, 2) i^ trivially safe, because ^' can keep executing the 
same contingency for ever and this contingency will have to be respected by its 
neighbors, as it will always be known ahead of time. 

In both cases, all three points of the Invariant are satisfied for R' and the inductive 
step is proved. Thus, if the Invariant holds, the algorithm maintains its validity, n 

D. Addressing the Assumptions: Theorem 1 assumed that messages are trans- 
mitted instantaneously and that all the robots communicate one with another. The 
assumption that plans are transmitted instantaneously will not hold in real-world 
experiments with wireless communication. Similarly, it is more realistic to assume 
that robots can communicate only if their distance is below a certain threshold. In the 
latter case, the proposed approach can be invoked using only point to neighborhood 
communication and thus achieve higher scalability. The following theorem shows 
that the safety guarantees can be provided without these restrictive assumptions. 



Theorem 2: Algorithm 14.11 guarantees the maintenance of the safety invariant in 
every planning cycle given it holds during cycle (t'^ : t'^) and that: 
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i) two robots with limited communication ranges can communicate before they en- 
ter into ICS given a predefined set of contingencies r(oo), 
ii) robots utilize acknowledgments upon the receipt of a trajectory by a neighbor. 

Sketch of Proof: Theorem 1 showed that the invariant holds as long as it was valid 
during the first cycle (fg : r'j ) and that two vehicles can communicate continuously 
since r^. For two robots with limited communication range, denote as time tcomm the 
beginning of the first planning cycle of either robot after they are able to communi- 
cate. If at tcomm, both robots have available a contingency 7(00) g /"(oo), that can be 
used to prove the safety of their corresponding states, then all the requirements of 
Theorem 1 are satisfied for f^ - tcomm- Thus the invariant will be maintained. 

Regarding the issue of delayed 
messages, consider the case that 

R^'s cycle ends at time f/,, which , —jti 

is before the end of the neighbor- ^^ 

ing R"s cycle at time 4- Figure [3] R' . ^p . 

provides an example. If the trans- ti 

mission of the trajectory n-' to /?' is „. , .^ r. .u . . r ■ u 

■' -^ t ig. 3 Ir messages arrive alter the start 01 a neigh- 

delayed, it might arrive after time ^^^,^ ^^^^^^ ^^^j^^ ^^ ^-^^ ^j,^ ^^^^^^^ ^^^^^ ^,/ ^^ 

r; and R' cannot detect that it did ^/ ^(,ove, this is problematic. 
not take into account the choice of 
R^ during its compatibility check 

given Algorithm 14.11 Thus, R''s choice might end up being incompatible with 
n^ . This problem becomes more frequent for robots that have synchronized cy- 
cles. Nevertheless, if an acknowledgment that signals the reception of a tra- 
jectory by a neighbor is used R' can acknowledge the message's reception, 
whether it arrives before or after f,',. If the acknowledgment arrives at R^ be- 
fore tf,, it knows that it is safe to execute n' . If the acknowledgment is not re- 
ceived on time, R^ can revert to its contingency which is by construction re- 
spected by the future plan of R', whatever this is. Thus, the introduction of 
an acknowledgment resolves the issue of possible delays in the transmission of 
trajectories. n 



5 Experimental Results 

To validate the theoretical discussion, simulated experiments were conducted. Our 
first experiments revealed performance deficits, however, practical modifications in 
the implementation of the algorithm were made. These resulted in significant speed 
ups and quick convergence to a solution. 

Implementation Specifics: This section describes some steps to make the 
implementation of Algorithm l4.1l more efficient computationally. In particular: 
• Instead of checking all the candidate plans 77' with the trajectories of the neigh- 
bors 77,^,,,, only the best plan in 77' according to a metric is checked. If this plan 
fails the check, then the previous contingency is selected. 
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• At each step of the "while" loop in Algorithm 14. 1 1 (lines 4-13), the implementa- 
tion propagates an edge along a tree of trajectories using a sampling-based planner, 
instead of generating an entire trajectory. If the edge intersects f^ j , a contingency 
y(f'j J : oo) is extended from x{f^ j). If the contingency is collision- free and com- 
patible with the available trajectories of neighbors in /I^gy, x(t'^ j ) is assumed safe. 
Otherwise, it is unsafe and no future expansion of an edge is allowed past x{f^ j). 

• The sampling-based expansion of the tree structure of trajectories is biased given 
a potential field in the workspace that promotes the expansion of the tree towards 
the goal ||2i. The tree expansion is also biased away from other vehicles. Different 
algorithms can be considered for the actual planning process ll20l[T5l [8l l23l . 

• Each robot maintains a common buffer for the sets -P™,, and P^^^ from each 
neighbor. As new trajectories are transmitted, they replace the part of old trajecto- 
ries that has already been executed by a neighbor along the buffer. 

• The latency in the experimental setup was relatively low. Thus, the situation in 
Figure [3] did not arise. Thus, the acknowledgement step was not included for the 
experiments presented below, which reduced the number of peer-to-peer messages. 

Modeled System: The experiments presented in this paper are 

using the model of a second-order car like vehicle ||T9 l shown ^ _ w-rnsC-COsO 

on the right side, where {x,y) are the car's reference point in v, = wcosOsinO 

Cartesian coordinates, 6 is the car's orientation, w its velocity Q = wsin( 

and f the steering angle. The controls are a, the acceleration, w = a 

and (p the rate of change of the steering angle. There are limits C~ 'P 

both for state and control parameters (HvvH < w,„ux, WCW < Cmax, 

llo'll < amax, \\<P\\ < (pmcix)- All robots havc range-limited communication out to 30% 

of the total environment width, and brake to zero speed for contingency. 

Environments. Four simulated environments 
were used for the experiments: 
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1. An "empty" environment (Fig.|4](left)), 

2. an "office" environment (Fig. [Hi, 

3. a " random" environment (Fig. |4] (right)), and 

4. an "intersection" environment with two 
crossing corridors (Fig.|5]). 

These environments are presented in approximate Fig. 4 Starting positions for the 
order of difficulty. The various experiments tested "empty" and "random" environ- 
different numbers of vehicles: 2, 4, 8, 16, 32, 48. m^nts. 
Because the 16 robots alone took up 6% of the en- 
tire workspace (ignoring obstacles), the size of the 

robots was reduced to half for the 32 robot case, and to a quarter of their size for 
the 48 robot case. If this was not done, then the robots would take up 12% and 18% 
of the workspace, respectively. Since much of the workspace is already occupied by 
obstacles, this reduction in size assists in reducing clutter effects that effect solution 
time. The empty environment was the easiest to solve. The office environment was 
chosen as a gauge for how hard a structured environment can be. The robots, in their 
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original size, are about 1/5 of the size of the hallway. In the random environment, 
there were polygons of varying shapes and sizes. The intersection case seemed to be 
the hardest to solve, since the robots not only have to navigate through a relatively 
narrow passage together with their neighbors, but they are all forced to traverse the 
center, almost simultaneously. 




■a 







Fig. 5 Snapshots from a typical run with 32 robots; Final image is the full trajectory of 
robot 0. 



When possible, starting/goal locations were identical across runs as more robots 
were added. Experiments for the same number of robots have the same start/goal 
locations. All experiments were repeated at least 10 times. The algorithm was run 
in real time such that computation time is equal to execution time. 

Evaluation of Safety. To verify that the system implemented truly provides the 
guarantees presented in this paper, three different cases were considered for the 
algorithm: (i) an implementation without contingencies, (ii) with contingencies but 
for robots with synchronized cycles and (iii) with contingencies and robots that 
are not synchronized. For each type of experiment the following figure reports the 
percentage of successful experiments. 20 experiments were executed for each case, 
averaging across synchronous and asynchronous cases. The results presented clearly 
indicate that enabling contingencies results in a safe system in all cases. 
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Scalability and Efficiency. Once the safety of the approach was confirmed, the 
focus turned on evaluating the effects of contingencies. A high-selection rate of 
contingencies is expected to decrease the performance of the robots, as these plans 
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are not selected to make progress towards the goal. The following table presents the 
average duration of experiments in seconds and the average velocity achieved by 
the robots both for the case without contingencies and the case with contingencies 
(both for synchronized and asynchronous robots). The performance data without 
contingencies is from the cases where none of the robots entered ICS, which means 
they often correspond to fewer than 20 experiments, and in some cases there is no 
successful experiment without contingencies to compare against. 
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The behavior of the robots is indeed more conservative when contingencies are 
employed and it takes longer to complete an experiment. Although the algorithm 
has no progress guarantees, the randomized nature of the probabilistically com- 
plete planning algorithms helped to offset this. The simulations always eventually 
resulted in a solution for the tested problems even if the robots temporarily entered 
oscillatory motions. The local penalty for trajectories that brought an agent in close 
proximity to neighboring robots helped to reduce the occurrence of oscillations and 
resulted in significant improvements in performance. 

Synchronous vs. Asynchronous. Another objective of the experimentation proce- 
dure was to evaluate the differences in the performance of the algorithm between the 
synchronous and the asynchronous case. In the synchronous case, all robots have a 
zero time offset but they are not aware of their synchronicity and they are not taking 
advantage of it as in previous work ISj. In the asynchronous case, the offsets are the 
same across 10 averaged runs. These offsets are randomly precomputed and range 
from to a maximum of 3/4 of the planning cycle. 

When the robots' cycles are synchronized, then it will be often the case that 
robots are transmitting simultaneously, and potentially during the compatibility 
check of their neighbors. This in certain cases results in slightly longer durations 
for the completion of an experiment, as well as lower average velocities, but overall 
there is no consistent effect as in the random and empty scenes, there is a perfor- 
mance boost under synchronous operation, especially as the number of robots in- 
creases. In comparison to previous work |[3l where synchronicity was specifically 
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taken advantage of, it is clear that the quality of the paths selected are worse in the 
current asynchronous implementation. However, it is expected that further research 
in asynchronous coordination algorithms can reduce this performance gap. 

Scaling. Larger scale simulations for 32 and 48 robots were run to study the al- 
gorithm's scalability. For these cases, the approach without contingencies always 
fails. Note that as mentioned earlier, these robots are of reduced size to decrease the 
effects on completion time due to a cluttered environment. 

Achieving safe, asynchronous operation 
for 48 second-order systems with the pro- ^ooo 
posed setup is a challenge. The agent 
model is complex as are the safety guar- ~ 
antees address the ICS issue. The Simula- ^ 
tion environment mimics the constraints of '^ '^^ 
real-world communication by running each *'° 
agent on a separate processor and allow- o 

ing only message-passing communication 
(TCP sockets). An experiment with 48 robots 
requires 49 separate processors (1 processor 
is used as a simulation server). 

Parameter Evaluation. An important parameter for the proposed approach is the 
duration of the planning cycle. For shorter durations of cycles, there was a higher 
deviation between runs and it was not possible to execute the larger experiments 
with 32 and 48 robots for a cycle duration less than 2 seconds. This limitation is 
due to the single thread running the world simulation. It is expected that the limit 
in hardware implementation would be dependent on the communication latency. 
The average completion time shows a noticeable increase as the duration of a cycle 
increases. The experiments presented in the previous tables were executed for a 
cycle duration of 2.5 seconds. 
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6 Discussion 

This paper presented a fully distributed algorithm that guarantees ICS safety for 
a number of second-order robots that move in the same environment. Simulations 
confirm that the framework indeed provides safety and is scalable and adaptable. 
Additional experiments not presented above were conducted for a system with pos- 
itive minimum velocity, i.e., a system that cannot brake to zero velocity. Safety was 
achieved for this system using a different set of contingencies than braking maneu- 
vers. In this case, the system was required to turn into the tightest circle possible 
without exceeding the specified limits on velocity and turning rate. Future work 
includes: (a) considering robots with difi^erent durations for planning cycles, (b) 
dealing with unreliable communication, (c) studying the effects of motion uncer- 
tainty to the protocol's performance, (d) distributed optimization for improving the 
quality of paths selected despite the asynchronous operation, (e) dealing with non- 
cooperating vehicles and (f) addressing tasks that go beyond moving from initial to 
final states. Experiments using physical systems with interesting dynamics would 
provide a real-world verification of the approach. 
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Incremental Sampling-Based Algorithms for a 
Class of Pursuit-Evasion Games 



Sertac Karaman and Emilio Frazzoli 



Abstract. Pursuit-evasion games have been used for modeling various forms of 
conflict arising between two agents modeled as dynamical systems. Although ana- 
lytical solutions of some simple pursuit-evasion games are known, most interesting 
instances can only be solved using numerical methods requiring significant offline 
computation. In this paper, a novel incremental sampling-based algorithm is pre- 
sented to compute optimal open-loop solutions for the evader, assuming worst-case 
behavior for the pursuer. It is shown that the algorithm has probabilistic complete- 
ness and soundness guarantees. As opposed to many other numerical methods tai- 
lored to solve pursuit-evasion games, incremental sampling-based algorithms offer 
anytime properties, which allow their real-time implementations in online settings. 

1 Introduction 

Pursuit-evasion games have been used for modeling various problems of conflict 
arising between two dynamic agents with opposing interests i^lM- Some examples 
include multiagent collision avoidance yfl, air combat j^, and path planning in an 
adversarial environment ISfl. The class of pursuit-evasion games that will be consid- 
ered in this paper is summarized as follows. Consider two agents, an evader and a 
pursuer; the former is trying to "escape" into a goal set in minimum time, and the 
latter is trying to prevent the evader from doing so by "capturing" her, while both 
agents are required to avoid collision with obstacles. The evader is only aware of 
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the initial state of the pursuer, while the pursuer has access to full information about 
the evader's trajectory. This class of pursuit-evasion games is of interest when the 
evader can be easily detected by stealthy pursuers, who operate from known loca- 
tions. Problems in this class include the generation of trajectories for an airplane to 
avoid threats from known Surface-to- Air Missile (SAM) sites, or for a ship to avoid 
attacks by pirates based at known locations. The information structure of this class 
of pursuit-evasion games is such that the evader discloses her (open-loop) strategy 
first, and the pursuer decides his strategy accordingly. In this setting, the evader's 
strategy should be chosen carefully, considering the worst-case (from the evader's 
point of view) response of the pursuer. Rational players in this game will choose a 
Stackelberg strategy with the evader as a leader [g] . 

Analytical solutions to certain classes of pursuit-evasion games, e.g., the "homi- 
cidal chauffeur" and the "lady in the lake" games, exist yl |2[]. However, for 
problems involving agents with more complex dynamics, or for problems involving 
complex environments (e.g., including obstacles), existing analytical techniques are 
difficult to apply. For example, the pursuit-evasion game addressed in this article can 
be solved in principle by determining the set of all states that can be reached by the 
evader before the pursuer, and then choosing the optimal trajectory for the evader, 
if one exists, within this set [1']. In the simple case of kinematic agents moving with 
bounded speed within an empty environment, such a set coincides with the evader's 
region in a Voronoi tesselation generated by the evader's and pursuer's initial con- 
ditions. However, analytical methods for computation of this set are not available in 
the general case in which non-trivial dynamics and obstacles are considered. 

Standard numerical approaches for solving pursuit-evasion games are based on 
either direct or indirect methods fT] . The former reduce the problem to a sequence 
of finite dimensional optimization problems through discretization 18J, whereas the 
latter solves the Isaacs partial differential equation with boundary conditions using, 
e.g., multiple shooting jgl llOl] . collocation llUll2[1 . or level-set methods JsL llSn . 

A number of algorithms for motion planning in the presence of dynamic, possi- 
bly adversarial obstacles, have been proposed in the context of mobile robotics. A 
common approach relies on planning in a 'space-time' state space, avoiding spatio- 



temporal regions representing possible motions of the dynamic obstacles Ill4ill5[l . 
However, such regions, representing reachable sets by the dynamic obstacles, are 
typically hard to compute exactly in the general case, and conservative approxima- 
tions are used, e.g., to estimate regions of inevitable collision 1161 . Other recent 



contributions in this area include 11171 - 1241] 



Several types of pursuit-evasion games have been studied from an algorithmic 
perspective. In particular, pursuit games on graphs [25l-:27.1 as well as on polygo- 



nal environments Il28[430ll have received significant attention during the last decade. 
More recently, pursuit-evasion games on timed roadmaps have also been consid- 
ered Oin . All these approaches typically impose severe limitations on the allowable 
agents' dynamics, e.g., by considering only finite state spaces and discrete time. 

Based on recent advances in incremental sampling-based motion planning al- 
gorithms, we propose a new method for solving the class of pursuit-evasion games 
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under consideration. In fact, the game we consider is a generalization of the kin- 
odynamic motion planning problem |15]. During the last decade, a successful al- 
gorithmic approach to this problem has been the class of sampling-based methods 
including, e.g.. Probabilistic RoadMaps (PRMs) 132[], which construct a roadmap by 
connecting randomly-sampled states with feasible trajectories so as to form a strong 
hypothesis of the connectivity of the state space, and, in particular, the initial state 
and the goal region. 

Incremental versions of sampling-based motion planning methods were proposed 



to address on-line planning problems lll7ll331J[n particular, the Rapidly-exploring 



Random Tree (RRT) algorithm proposed in Il33ll has been shown to be very effec- 
tive in practice, and was demonstrated on various platforms in major robotics events 



(see, e.g., 11341] '). Very recently, optimality properties of incremental sampling-based 
planning algorithms were analyzed and it was shown that, under mild technical as- 
sumptions, the RRT algorithm converges to a non-optimal solution with probability 



one 113511 . In BSD , the authors have proposed a new algorithm, called RRT*, which 
converges to an optimal solution almost surely, while incurring essentially the same 
computational cost when compared to the RRT. The RRT* algorithm can be viewed 
as an anytime algorithm for the optimal motion planning problem. Loosely speak- 
ing, an anytime algorithm produces an approximate solution and gradually improves 



the quality of the approximation given more computation time 1136113711 . The quality 
measure is defined, e.g., with respect to a cost function. 

In this paper, inspired by incremental sampling-based motion planning methods, 
in particular the RRT* algorithm, we propose an incremental sampling-based algo- 
rithm that solves the pursuit-evasion game with probabilistic guarantees. More pre- 
cisely, if evader trajectories that escape to the goal set while avoiding capture exist, 
then the output of the algorithm will converge to the minimum-cost one with prob- 
ability approaching one as the number of samples increases.To the best of authors' 
knowledge, this algorithm constitutes the first algorithmic approach to numerically 
solve, with both asymptotic and anytime guarantees, the class of pursuit-evasion 
games under consideration. 

The paper is organized as follows. Section |2] formulates the problem. The pro- 
posed solution algorithms are introduced in Section [3] The algorithm is shown to 
be probabilistically sound and probabilistically complete in Section |4l Simulation 
examples are provided in Section [S] Conclusions and remarks on future work can 
be found in Section |6l 



2 Problem Definition 

We consider a two-player zero-sum differential game in which one of the players, 
called the evader, tries to escape in minimum time to a goal set, while the second 
player, called the pursuer, tries the capture the evader before it reaches the goal set. 
More formally, consider a time-invariant dynamical system described by the dif- 
ferential equation x{t) = f{x{t),Ue{t),Up{t)), where x : f i— > x{t) G X CW^ is the 
state trajectory, u^ '■ t ^^ Ue{t) G Us C K'"^ is the evader's control input, Wp : f !—> 
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Mp(?) G f/p C R^p is the pursuer's control input. The sets X, Ue, and Up are assumed 
to be compact, the control signals Me and Up are essentially bounded measurable, 
and f{z,We,Wp) is locally Lipschitz in z and piecewise continuous in both We and 
Wp. Consider also an obstacle set Xobs, a goal set Xgoai, and a capture set Xcapt; these 
sets are assumed to be open subsets of Z, and Xgoai and X^apt are disjoint. 

Given an initial condition .x(0) G X \Xobs and the control inputs of the evader and 
the pursuer, a unique state trajectory can be computed. The final time of the game is 
given by r = inf{f G R>o : x{t) G cl (Xgoai UXcapt) }■ Since this is a zero-sum game, 
only one objective function will be considered, defined as follows: L{us,Up) = T, 
if x(r) G cl(Xgoai); and L{ue,Up) = +°°, otherwise. The evader tries to minimize 
this objective function by escaping to the goal region in minimum time, while the 
pursuer tries to maximize it by capturing the evader before she reaches the goal. 

Let BR : Us ^ Up denote a transformation that maps each evader trajectory to 
the best response of the pursuer, i.e., BR(Me) := argmax„ L(Me,Mp)- In the game 
described above, the evader picks her strategy so thatL* = L(m*,BR(m*)) < L{ue,ep) 
for all Me and all Wp. Let u* :— BR(m*). Then, (m* and u*) are called the (open-loop) 
Stackelberg strategies of this differential game ||2[]. 

Note that open-loop Stackelberg strategies computed for the evader in this way 
would be conservative when compared to the saddle-point equilibrium of a pursuit- 
evasion game with feedback information pattern (see |l2j]). Open-loop Stackelberg 
strategies correspond to trajectories that would allow escape without any additional 
information on the pursuer other than the initial condition. Should other information 
become available, or should the pursuer not play optimally, the time needed to reach 
the goal set may be further reduced. In addition, even in the case in which escape is 
unfeasible (i.e., L* = +00) under the open-loop information structure for the evader, 
there may exist feedback strategies that would allow the evader to escape while 
avoiding capture. 

As common in pursuit-evasion games, the problem considered in this paper fur- 
ther possesses a separable structure, in the following sense. It is assumed that the 
state can be partitioned as x = (xe,Xp) G Xe x Xp = X, the obstacle set can be 
similarly partitioned as Xobs = (^obs.e x -^p) U [X^ x Xobs.p), where Xobs.e C X^ and 
Xobs.p C Xp, the goal set is such that Xgoai = (^e^goai x Xp) \Xcapt, where X^^goai C Zg, 
and the dynamics are decoupled as follows: 



d , . d 
dt dt 



Xe(f) 
^p(0 



:/(x(0,m(0) 



/e(Xe(0,"e(0) 
fp{xp{t),Up{t)) 



for all r G : 



^>o, 



It is also assumed that the initial condition is an equilibrium state for the pursuer, 
i.e., there exists u'p G Up such that/p(xinit,p,Mp) = 0. 

Assume that there exist a Stackelberg strategy enabling the evader to escape (i.e., 
L* < +0°). An algorithm for the solution of the pursuit-evasion game defined in this 
section is said to be sound if it returns a control input u'^ such that max„ L(Me, Mp), 
is finite. An algorithm is said to be complete if it terminates in finite time returning 
a solution uL as above if one exists, and returns failure otherwise. 
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The pursuer dynamics can be used to naturally model one or more pursuing 
agents, as well as moving obstacles whose trajectories are a priori unknown. It is 
known that even when the number of degrees of freedom of the robot is fixed, the 
motion planning problem with moving obstacles is NP-hard, whenever the robot 
has bounds on its velocities. In fact, a simple version of this problem, called the 2-d 
asteroid avoidance problem, is NP-hard [38]. 

The discussion above also suggests that complete algorithms aimed to solve the 
proposed pursuit-evasion game will be computationally intensive. To overcome this 
difficulty, we propose a sampling-based algorithm, which is both probabilistically 
sound, i.e., such that the probability that the returned trajectory avoids capture con- 
verges to one, and probabilistically complete, i.e., such that the probability that it 
returns a solution, if one exists, converges to one, as the number of samples ap- 
proaches infinity. Finally, the proposed algorithm is asymptotically optimal in the 
sense that the cost of the returned trajectory converges to the value of the game L*, 
almost surely, if L* < +00. 



3 Algorithm 

In this section, an algorithm that solves the proposed pursuit-evasion game with 
probabilistic soundness and completeness guarantees is introduced. This algorithm 



is closely related to the RRT* algorithm recently introduced in BSD , which will be 
discussed first. RRT* is an incremental sampling-based motion planning algorithm 
with the asymptotic optimality property, i.e., almost-sure convergence to optimal 



trajectories, which the RRT algorithm lacks 03511 . In fact, it is precisely this property 
of the RRT* that allows us to cope with the game introduced in the previous section. 

Before formalizing the algorithm, some primitive procedures are presented be- 
low. Let a e {e,p} denote either the evader or the pursuer. 

Sampling: The sampling procedure Sample^, :N ^ Xa returns independent and 
identically distributed samples from Xa- The sampling distribution is assumed to be 
absolutely continuous with density bounded away from zero on Xa- 

Distance Function: Given two states zi and Z2, let dist« (21,22) be a function 
that returns the minimum time to reach zi starting from zi , assuming no obstacles. 
Clearly, the distance function evaluates to the Euclidean distance between zi and Z2 
when/„(x„,M„) = Maand \\ua\\ < 1. 

Nearest Neighbor: Given a tree G = {V,E), where V C Xa, and a state z £ Xa, 
Nearesto;(G,z) returns the vertex v eV that is closest to z. This procedure is de- 
fined according to the distance function as NearestQ;(G,z) = argminyev dist(v,z). 

Near-by Vertices: Given a tree G = {V,E), where V C Xa, a state z £ Xa, and a 
number « £ N, Near^ (G,z,m) procedure returns all the vertices in V that are suf- 
ficiently close to z, where closeness is parameterized by n. More precisely, for any 
zeXa, letReacha(z,/) := {z' GX |dist(z,z') </ Vdist(z',z) < /}. Given, z and «, 
the distance threshold is chosen such that the set Reach«(z,/(«)) contains a ball of 
volume 7a: -^, where Ya is an appropriate constant. (This particular scaling rate 
is chosen since it ensures both computational efficiency and asymptotic optimality 
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of the RRT* algorithm 1351 |39tl.) Finally, we define Nearc((G,z,n) := 
ynReach„(z,/(M)). 

Collision Check: Given a state trajectory x : [0,t] -^ Xa, the ObstacleFreeo;(x) 
procedure returns true if and only if x lies entirely in the obstacle-free space, i.e., if 
and only if x(f') ^ Xobs.a for all f' S [0,t]. 

Local Steering: Given two states zi,Z2 &Xa, the St eer^ (21,22) function returns 
a trajectory that starts from zi and ends at Z2, ignoring obstacles. We assume that the 
Steer procedure returns a time optimal trajectory that connects zi and zi exactly if 
zi and Z2 are sufficiently close to each other. More precisely, there exists an £ > 
such that for all ||zi — Z2|| < £> the Steer (21,22) procedure returns {x,u,t) such that 
x{0) =z\,x{T) — Z2, andi(f') = fa{x{t'),u{t')) for all/' G [0,/], and/ is minimized. 

Given a vertex v, let Xy be the unique trajectory in the tree that starts from the root 
vertex and reaches v. Let us denote the time that Xy reaches v by T{v); given a state 
trajectory x: [0,t] -^ X, let us denote the ending time t with EndTime(x). 

If the pursuer is inactive (e.g., it is not moving), the pursuit-evasion problem in 
Section [2] reduces to a standard time-optimal kinodynamic motion planning prob- 
lem. The RRT* algorithm that solves this problem is presented in Algorithm[T] 

The RRT* algorithm proceeds similarly to other incremental sampling-based mo- 
tion planning methods (e.g., the RRT 11331] ') by first sampling a state a from the 
obstacle-free space (Line|4l) and then extending the tree towards this sample (Line 
[5]). The extension procedure of the RRT*, presented in Algorithm|2l first extends the 
vertex closest to the sample (Lines |2][3]l; if the extension is collision-free (Line|4l), 
then the end point of the extension, say Znew, is added to the tree as a new vertex 
(Line|5ll, as in RRT. However, the RRT* Extend^ procedure differs from others in 
that it connects the new vertex Znew to the vertex that lies within a ball of volume 
0(log(«)/«) centered at Znew, where n — \V\ is the number of vertices in the tree, 
and incurs the smallest cost to reach Znew with a collision-free trajectory (Lines [S]- 
\T2\1 . Moreover, the RRT* Extend^ procedure extends Znew back to the vertices in 
the tree that are within the ball of same size centered at Znew ; if the extension to such 
a vertex, say Znear, results in a collision-free trajectory that reaches Znear with smaller 
cost, then tree is "rewired" by connecting Znear to Znew, instead of its current parent 
(Lines[T3]-[l8ll. 

The algorithm that is proposed for the solution of the problem in Section[2]builds 
on RRT*, and relies on the following additional primitive procedures. 

Near-Capture Vertices: The NearCapure^, procedure works in a way that is very 
similar to the Near« procedure. Given a tree G = {V,E), a state z G Xa, and a num- 
ber n, the NearCapture„ (G,z,n) procedure returns all vertices z' that are "close" 
to being captured from z. In other words, and assuming a — p for simplicity, let 
CaptureSetp(z) := {z' G X^ : {z',z) G Xcapt}- Then, NearCapturep(G,z,n) = {v G 
y I there exist y G CaptureSet (z) such that v G Reache (>',/(«))}. 

Remove: Given a graph G — {V,E) on Xa, and a vertex z € V, the procedure 
Remove(G,z) removes z, all its descendants, and their incoming edges from G. 

The algorithm proposed to solve the pursuit-evasion game under consideration is 
given in Algorithm [3] The algorithm maintains two tree structures encoding can- 
didate paths: the evader tree Ge and the pursuer tree Gp. At each iteration, the 
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Algorithm 1. The RRT* Algorithm 



1 Ve^{zi„i,};£e^0;/^O; 

2 while I < A' do 

3 Ge^(Ve,£e); 

4 Ze.rand^Samplee(0; 

5 (Ve,£e,Ze,new) ^ Extende(Ge, Ze,rand) 

6 i^i+l; 



Algorithm 2. Extenda (G,z) 


1 V'^V;£-'^£; 


2 Znearest^Nearesta(G,z); 


3 (-'^new^^'newj^newj * Steer o:(Znearest5 Zj; Znew ^ -"'^newv^newji 


4 if ObstacleFreea(jnew) then 


5 


V'^V'U{Znew}; 


6 


^min ^ Znearesti '^'min ^ -^ (Znewji 


7 


^nearby ^ Neara(G,Znew, |V|); 


8 


for all Znear £ Znearby dO 


9 




(,-^neari ^near linear) ^ Steer q; (Znear^Znewji 


10 




if ObstacleFreec((x„f.ar) and Xnear(«near) = Znew and 
r(znear)+EndTime(x„ear) < r(znew) then 


11 






fmin ^ r(znear)+EndTime(.K:near); 


12 






Zmin ^ Znear; 


13 


E' ^f'UKzminiZnew)}; 


14 


for all Znear £ ^nearby \ {Zmin} dO 


15 




(-^neari ^neari^nearj ^ Steer o; (ZnewiZnearj* 


16 




if ObstacleFree„(.i:near) anrf%ear(rnear) = Znear and 
r(Znear) > r(znew) + EndTime(xnear) then 


17 






Zparent ^ Parent (znear); 


18 






E' ^£'\{(Zparent, Znear)}; E' ^- £' U {(Znew, Znear)}; 


19 else 


!0 |_ Znew = NULL; 


!1 return G' = (V",£", Znew) 



algorithm first samples a state, Ze.rand^ in the evader's state-space (Line ^ and 
extends the evader tree towards Ze.rand (Line O. If the extension produces a new 
vertex Ze,new (LineO, then the algorithm checks whether the time that the evader 
reaches Ze.new is less than that at which the pursuer reaches any pursuer vertex within 
certain distance to Ze,new (Lines iTlfTOll. This distance scales as 0(log(n)/n), where 
n is the number of vertices in the pursuer tree, Gp. If this condition does not hold, 
then Ze,new is removed from evader's tree (LinefTOt. 

Second, the algorithm samples a new state, Zp.rand, in the pursuer state space 
(Line fTTI) and extends the pursuer's tree towards Zprand (Line fT2t . If this exten- 
sion successfully produces a new vertex, Zp.new (LinefTSl). then the algorithm checks 



78 



S. Karaman and E. Frazzoli 



Algorithm 3. Pursuit-Evasion Algorithm 


1 Ve ^ {xe.i„„}; £e ^ 0; Vp ^ {.Yp,,,;,}; £p ^ 0; / ^ 0; 

2 while I < A' do 

3 Ge^(Ke,£e);Gp^(Vp,£p); 

4 Ze.rand^Sampleg(0; 


5 
6 


(Ve,£e,Ze,new) ^ Extende(Ge, Ze,rand); 

ifze.newT^NULLthen 


7 




Zp.neai- ^ NearCaptureg(Gp, Ze.new, |Vp|); 


8 




for all Zp.near G Zp^near dO 


9 






if Time(zp,near) < Time(ze,new) then 


10 






|_ Remove(Ge,Ze,new); 


11 


Zp ^ Samplep(0; 


12 


(Vp,£p,Zp,new) ^Extendp(Gp,Zp_rand) ; 


13 


if Zp.new 7^ NULL then 


14 




Ze.near ^ NearCapturep(Ge, Zp,new, |Vel); 


15 




for all Ze.near G -Ze^near dO 


16 






if Time(zp,new) < Time(ze,near) then 


17 






Remove (Ge,Ze,near); 


18 


_ i^i+U 


19 return Ge, Gp 



whether the evader can reach any of the evader vertices that He within a certain 
distance to Zp.new in less time than the pursuer can reach Zp.new (Lines fT4lfT7l l. Any 
evader vertex that is within a certain distance to Zp.new and that does not satisfy this 
requirement is removed from the tree with its descendants (LineflTl). The distance 
scales as 0(log(«)/«), where n is the number of vertices in the evader's tree, Gg. 

The algorithm returns two trees, namely Ge and Gp. From the evader's tree Ge, 
the control strategy that makes the evader reach Xg^ai in minimum time (if one exists) 
is the solution candidate after N iterations. 



4 Analysis 

In this section, theoretical guarantees of the algorithm are briefly outlined. Due to 
lack of space, detailed proofs of the results are left to a full version of this paper. 

Let us note the following technical assumptions, which we will assume through- 
out this section without reference. Firstly, it is assumed that the dynamical sys- 
tems modeling the evader and the pursuer independently satisfy local controllability 
properties. Secondly, we will assume that there exists a Stackelberg strategy for the 
pursuit-evasion game with finite value of the game L*, and such that sufficiently 
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small perturbations to the strategy also yield a finite value. A formal statement of 
these assumptions can be found (for the optimal motion planning case) in 1321 • 

First, note the following lemma stating the optimality property of the RRT* al- 
gorithm (Algorithm [ll when the algorithm is used to solve a time-optimal kinody- 
namic motion planning problem. Let G[i] = {V[i],E[i]) denote the tree maintained 
by the RRT* algorithm at the end of iteration i. Given a state z ^ X , let T* (z) denote 
the time an optimal collision-free trajectory reaches z, i.e., T*{z) := inf„{r \x{T) = 
zandi:(f) = f{x{t),u{t)),x{t) ^ Xobs for all f G [0,r]}.Letz e V[j] be a vertex that 
is in the tree at iteration j. The time that the unique trajectory that is in G[i\ for some 
/ G N and that starts from the root vertex and reaches z is denoted by T{z) [i] . 

The following theorem follows directly from the asymptotic optimality of the 
RRT* algorithm shown in BOll . Let ju(-) denote the Lebesgue measure. 



Theorem 1 (Asymptotic Optimality of RRT* IM)- If J > 2'^ {\ + \/d)ji{X\ 
^obs)> the event that for any vertex z that is in the tree in some finite iteration j 
the RRT* algorithm converges to a trajectory that reaches z optimally, i.e., in time 
T*{z), occurs with probability one. Formally, 

P({lim,-^^r(z)[/ + ;] = r(z), VzGy[7]}) = l, V;gN. 

Let Ta{za)['\ denote the time at which the vertex Za in Vaf] is reached, for a G 
{e,p}, and let T*{za) be the time the time-optimal collision-free trajectory reaches 
Za (disregarding the other agent). Theorem[T]translates to the evader tree in a weaker 
form: 

Lemma 1. Under the assumptions ofTheorem\l\ applied to the evader tree, 

P({limre(Ze) ['■ + ;•] > r(Ze), VZe £ V[j]]) = 1, Vj G N. 

Lemma [T] follows directly from Theorem [T| noting that the evader's tree can only 
include fewer edges (due to removal of evader's vertices near capture), when com- 
pared to the standard RRT* algorithm. 

A similar property can be shown in terms of capture time estimates. Given Ze G 
Xe, define CaptureSetg(ze) as the set of all states in X^ reaching which the pursuer 
can capture the evader, and let C* (ze) denote the minimum time at which this capture 
can occur, i.e., C*(ze) := infu {T I Xp(r) G CaptureSet (ze)}- 

Lemma 2. Le/Cp(ze)['] :=min{rp(zp)[/] | Zp G NearCaptureg(Gp[/],Ze,0}- Then, 
under the assumptions ofTheorem\l\ applied to the pursuer tree, 

P({limCp(Ze)[i] =C*(Ze)}) - 1, VZe eXe. 

Proof (Sketch). Let the set DomainNearCaptureg(z, n) be defined asjzp G X^ \ 3y G 
CaptureSetg(z),Zp G Reachp (}',/(«))}, where l{n) was introduced in the definition 
of the NearCapture procedure. Note that (i) DomainNearCaptureg(Gp[i],Ze,0 ^ 
CaptureSetj.(ze) for all / G N, and (ii) n,GNDoniainNearCapturej.(Gp[/],Ze,0 = 
CaptureSetg(ze)- Thus, the set DomainNearCaptureg(Gp[/],Ze,0 converges 
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to CaptureSetj,(ze) from above as ; -^ o°. Let X*i{ze) be the subset of 
CaptureSetg(ze) that the pursuer can reach within time C* (ze)- The key to proving 
this lemma is to show that the set DomainNearCaptureg(Gp[(],Ze, is sampled in- 
finitely often so as to allow the existence of a sequence of vertices that converges to 
a state in X*^. Then, for each vertex in the sequence, by Theorem[Tl the RRT* algo- 
rithm will construct trajectories that converge to their respective optimal trajectory 
almost surely, which implies the claim. 

To show that the sets DomainNearCapturej.(Gp[/],Ze,0 ^^^ sampled infinitely 
often as / -^ oo, note that the probability that there is no sample inside the 

set DomainNearCapture(Gp[/],Ze,0 is (1 - t^^)'- In addition, Z (l - 

jt|t i^\ ^ ^i^i/i^jnx^ J5 fjjjjfg fQ^ ^^^^^ > ^(^Xp). Thus, by the Borel-CantelU 

lemma |4(|], the event that there are no samples inside NearCapture(Gp[/],Ze,j) 
occurs only finitely often with probability one; hence, the same sequence of sets is 
sampled infinitely often with probability one. D 

The next lemma states that all vertices satisfy the soundness property. 

Lemma 3. Let Bj denote the following event: for any vertex Ze that is in evader's 
tree by the end of iteration j, if the pursuer can reach Ze before the evader, then 
Cp(ze)['] converges to a value that is smaller than the value that T^{ze)[i\ converges 
to as i approaches infinity, i.e., Bj :~ {{{C*{ze) < T*(ze)) =^ (lim,'^ooCp(xe) < 
lim,-^^7;(ze)), Vz, e Ve[j]}. Then, ¥{Bj) = \ for all j £ N. 

Proof. Fix some j G N. Consider the events {Yv[ni^o^Te{ze)[i + f\ > T*{ze), Vze S 
Ve [j] } and { lim;^oo Cp {ze)[i+j] = C* (ze ) } , both of which occur with probability one 
by Lemmas [Hand 121 respectively. Hence, their intersection occurs with probability 
one, i.e., 

Iim7;(z. )[/ + ;■] > T*{ze) A limCp(z. )[/ + ;] = C*{ze), Vz. G Ve[j] 

Finally, lim,-^^r,(z.)[/ + 7] > T*{ze) A limi^^Cpize)[i + j] =C*{ze) logically im- 
plies {{C*{ze) < r*(ze)) ==> (lim,-^,»Cp(xe) < lim,-^o„ Ti (ze ) ) ■ Substituting the latter 
in place of the former in the equation above yields the result. D 

Let Xe[(] denote the trajectory that is in evader's tree, Ge[i], by the end of iteration 
i and that reaches the goal region in minimum time. Recall that T* is the ending 
time of the minimum-time collision-free trajectory that reaches the goal region and 
avoids capture. 

The next theorem states the probabilistic soundness of Algorithm[3] That is, the 
probability that any evader strategy returned by the algorithm is sound (i.e., avoids 
capture by the pursuer) approaches one as the number of samples increases. More 
precisely, for all e > and allf G [0, T*], the probability that the state Xe[/](f) avoids 
capture, if the pursuer is delayed for e units of time in the beginning of the game, 
approaches one as / — > 00. 
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Theorem 2 (Probabilistic Soundness). Let A^j [i] denote the event {t < 
C*{x[i]{t)) + e}. Then, \imi^^F{Ae^t[i\) = I, for all e > and all t G [0,T]. 

Proof (Sketch). Let S'lj] = {zi,X2,...,zk} ^ K^] denote the set of all vertices 
in the evaders tree that are along the path Xe[j]. Let ^[j] = {ti,t2,... jtx} denote 
the corresponding time instances, i.e., Zk = Xe[tj]{tii) for all k G {1,2,..., A"}. By 
Lemma [3l the theorem holds for the time instances corresponding to the states in 
3f[j]. However, it must also be shown that the same holds for all trajectories that 
connect consecutive states in ^[j]. Such trajectories are referred to as intermediate 
trajectories from here on. 

Letfmax['] '■— ^^^ti-.ti- ieS[i\{tk+i —fA;). The algorithm provided in this paper does 
not check the soundness of intermediate trajectories, but checks only that of the 
vertices. However, it can be shown that for any e > 0, lim,-^ooP({frnax['] < £}) = 1- 
Roughly speaking, with probability one, the time-optimal path is never achieved, but 
the algorithm converges towards that optimal as the number of samples approaches 
infinity. Since each intermediate path that is along x^ [j] is sub-optimal with proba- 
bility one, in the process of convergence it is replaced with a lower cost path that 
includes two or more vertices of the tree in some later iteration / > j. 

Since fmax['] < £ logically implies that t < C*{x[i]{t)) + e for all t e [0,T], 
{?maxM < e} C {f < C*{x[i\{t)) + £, Vf e [0,7]}, which implies P({Wx['] < e}) < 
P({f < C*{x[i\{t)) + e}). Taking the limit of both sides yields the resuh. D 

Let us also note the following theorems regarding the probabilistic completeness 
and asymptotic optimality of the algorithm. The proofs of these theorems are rather 
straightforward and are omitted due to lack of space. 

Theorem 3 (Probabilistic Completeness). Under the assumptions of Theorem [7] 
Algorithm\3\finds a trajectory that reaches the goal region while avoiding collision 
with obstacles and capture by pursuers, if such a trajectory exists, with probability 
approaching one as the number of samples approaches infinity. 

Theorem 4 (Asymptotic Optimality). Let L[{\ be the cost of the minimum-time tra- 
jectory in the evader's tree at the end of iteration i that reaches the goal region, if 
any is available, and +o° otherwise. Then, under the assumptions ofTheorem\l\ L[i\ 
converges to the value of the pursuit-evasion game, L* , almost surely. 



5 Simulation Examples 

In this section, two simulation examples are presented. In the first example, an 
evader modeled as a single integrator with velocity bounds is trying to reach a goal 
set, while avoiding capture by three pursuers, each of which is modeled as a single 
integrator with different velocity bounds. More precisely, the differential equation 
describing the dynamics of the evader can be written as follows: 



d , . d 
dt dt 



■^e,2(0 



Me(f) 



Me,l(r) 
Me,2(0 
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(a) 



(b) 





(c) 



(d) 



Fig. 1 The evader trajectory is shown in an environment with no obstacles at the end of 500, 
3000, 5000, and 10000 iterations in Figures (a), (b), (c), and (d), respectively. The goal region 
is shown in magenta. Evader's initial condition is shown in yellow and the pursuers' initial 
conditions are shown in black. The first pursuer. Pi , which can achieve the same speed that 
the evader can achieve, is located in top left of the figure. Other two pursuers can achieve 
only half the evader's speed. 



where ||Me(?)||2 < 1- The dynamics of the pursuer is written as follows: 



dt 



Xp{T) 



d 
dt 



A-Pl 



A-P2 



■Vp3 



it) 
(t) 
it) 





"■^Pi.i(0 




Xpi.lit) 


d 


Xp2:iit) 


dt 


Xp2-2{t) 




■^P3.l(0 




L-^P3.2(o 



Mp(r) 



«Pi (t) 

Up,_{t) 
Upjit) 



Up,. lit) 
«P1,2(0 
Up2,lit) 
Up2,2it) 
"P3.l(0 
"P3.2(0 



where ||Mp, (/)||2 < 1 and ||Mp^(?)||2 < 0.5 for^e {2,3}. 
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First, a scenario that involves an environment with no obstacles is considered. 
The evader's trajectory is shown in Figures [l (a)] 1 (d)| in several stages of the al- 
gorithm. The algorithm quickly identifies an approximate solution that reaches the 
goal and stays away from the pursuers. The final trajectory shown in Figure [T(d)] 
goes towards the goal region but makes a small deviation to avoid capture. The same 
scenario is considered in an environment involving obstacles and the evader's tree is 
shown in different stages in Figure [2(a)]|2(d)| Notice that the evader may choose to 
"hide behind the obstacles" to avoid the pursuers, as certain parts of the state space 
that are not reachable by the evader are reachable in presence of obstacles. 

In the second example, the motion of the pursuer and of the evader is described 
by a simplified model of aircraft kinematics. Namely, the projection of the vehi- 
cle's position on the horizontal plane is assumed to follow the dynamics of a Du- 
bins vehicle (constant speed and bounded curvature), while the altitude dynamics 
is modeled as a double integrator. The differential equation describing dynamics of 
the evader is given as follows. Let Xe(/) = [xe.i{t) , Xe2{t) , Xg,i{t) , Xs^4{t) , Xe^s{t)]''' 
and f{x^{t),u^{t)) = [ve cos(.)Ce.3(f)), Ve sin(xe,3(0), "e.ilOi-^e.slOi "e.2(0]^' and 





(a) 



(b) 





(c) (d) 

Fig. 2 The scenario in Figure[T]is run in an environment with obstacles. 
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Fig. 3 Figures (a) and (b) sliow the trees maintained by the evader at end of the 3000th itera- 
tion in an environment without and with obstacles, respectively. The initial state of the evader 
and the pursuer are marked with a yellow pole (at bottom right of the figure) and a black pole 
(at the center of the figure), respectively. Each trajectory (shown in purple) represents the set 
of states that the evader can reach safely (with certain probability approaching to one). 



Xe{t) = f{xf.{t),Uf.{t)), where Ve = 1, |Me.i(OI ^ 1' l"e,2(0l ^ 1' l-'^e.sl < 1- In this 
case, Ve denotes the longitudinal velocity of the airplane, Mg.i denotes the steering 
input, and Me.2 denotes the vertical acceleration input. The pursuer dynamics is the 
same, except the pursuer moves with twice the speed but has three times the mini- 
mum turning radius when compared to the evader, i.e., Vp = 2, |Mp.i | < 1/3. 

A scenario in which the evader starts behind pursuer and tries to get to a goal 
set right next to the pursuer is shown in Figure [3] First, an environment with no 
obstacles is considered and the tree maintained by the evader is shown in Figure [3(a)l 
at end of 3000 iterations. Notice that the evader tree does not include a trajectory 
that can escape to the goal set (shown as a green box). Second, the same scenario 
is run in an environment involving obstacles. The trees maintained by the evader 
is shown in Figure |3(b)[ Note that the presence of the big plate-shaped obstacle 
prevents the pursuer from turning left directly, which allows the evader to reach a 
larger set of states to the left without being captured. In particular, the evader tree 
includes trajectories reaching the goal. 

Simulation examples were solved on a laptop computer equipped with a 
2.33 GHz processor running the Linux operating system. The algorithm was 
implemented in the C programming language. The first example took around 3 sec- 
onds to compute, whereas the second scenario took around 20 seconds. 



6 Conclusions 



In this paper, a class of pursuit-evasion games, which generalizes a broad class 
of motion planning problems with dynamic obstacles, is considered. A computa- 
tionally efficient incremental sampling-based algorithm that solves this problem 
with probabilistic guarantees is provided. The algorithm is also evaluated with 
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simulation examples. To the best of authors' knowledge this algorithm constitutes 
the first incremental sampling-based algorithm as well as the first anytime algorithm 
for solving pursuit-evasion games. Anytime flavor of the algorithm provides advan- 
tage in real-time implementations when compared to other numerical methods. 

Although incremental sampling-based motion planning methods have been 
widely used for almost a decade for solving motion planning problems efficiently, 
almost no progress was made in using similar methods to solve differential games. 
Arguably, this gap has been mainly due to the inability of these algorithms to gen- 
erate optimal solutions. The RRT* algorithm, being able to almost-surely converge 
to optimal solutions, comes as a new tool to efficiently solve complex optimization 
problems such as differential games. In this paper, we have investigated a most ba- 
sic version of such a problem. Future work will include developing algorithms that 
converge to, e.g., feedback saddle-point equilibria of pursuit-evasion games, as well 
as relaxing the separability assumption on the dynamics to address a larger class of 
problems. 
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Multiagent Pursuit Evasion, or Playing Kabaddi 

Kyle Klein and Subhash Suri 



Abstract. We study a version of pursuit evasion where two or more pursuers are 
required to capture the evader because the evader is able to overpower a single de- 
fender. The pursuers must coordinate their moves to fortify their approach against 
the evader while the evader maneuvers to disable pursuers from their unprotected 
sides. We model this situation as a game of Kabaddi, a popular South Asian sport 
where two teams occupy opposite halves of a field and take turns sending an at- 
tacker into the other half, in order to win points by tagging or wrestling members of 
the opposing team, while holding his breath during the attack. The game involves 
team coordination and movement strategies, making it non-trivial to formally model 
and analyze, yet provides an elegant framework for the study of multiagent pursuit- 
evasion, for instance, a team of robots attempting to capture a rogue agent. Our pa- 
per introduces a simple discrete (time and space) model for the game, offers analysis 
of winning strategies, and explores tradeoffs between maximum movement speed, 
number of pursuers, and locational constraints!^ 



1 Introduction 

Pursuit-evasion games provide an elegant and tractable framework for the study of 
various algorithmic and strategic questions with relevance to exploration or moni- 
toring by autonomous agents. Indeed, there is a rich literature on these games under 
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various names lfT3l . such as man-and-the-lion llT2l[T6l l9]|. cops-and-robber ||6l[TJ[8j 
[BJHI, robot-and-rabbit ||6l, and pursuit-evasion llT4l [5ir71. iust to name a few. 

In this paper we study a (discrete time, discrete space) version of pursuit evasion 
where two or more pursuers are required to capture the evader because the evader is 
able to overpower a single (and isolated) defender. These situations arise in pursuit 
of a rogue non-cooperative agent, which could be a malfunctioning robot, a deliri- 
ous evacuee, or a noncooperative patient. Thus, the pursuers are forced to coordinate 
their moves to fortify their approach against the evader while the evader maneu- 
vers to disable pursuers from their unprotected sides. In the basic formulation of 
the game, all agents have the same capabilities including the maximum movement 
speed, but we also derive some interesting results when one side can move faster 
than the other. 

In modeling our pursuit-evasion scenario, we draw inspiration from the game of 
Kabaddi, which is a popular South Asian sport. The game involves two teams occu- 
pying opposite halves of a field, each team taking turns to send an "attacker" into the 
other half, in order to win points by tagging or wrestling members of the opposing 
team [17 1 . The attacker must hold his breath during the entire attack and success- 
fully return to his own half — the attacker continuously chants "kabaddi, kabaddi, 
• • •" to demonstrate holding of the breath. There are several elements of this game 
that distinguish it from the many other pursuit games mentioned above, but perhaps 
the most significant difference is that it typically requires two or more defenders to 
capture the opponent, while the attacker is able to capture a single isolated defender 
by itself. This asymmetry in the game adds interesting facets to the game and leads 
to interesting strategies and tradeoffs. 

While the use of multiple pursuers is common in many existing pursuit evasion 
games, the main concern in those settings is to simply distribute pursuers in the 
environment to keep the evader from visiting or reentering a region. This is indeed 
the case in all graph searching ^ |TT] [141 or visibility based pursuit evasion |[5j 
171 [TSl . In the lion-and-the-man game also there are known results that show that 
multiple lions can capture the man when the man lies inside the convex hull of the 
lions [TOl. By contrast, the main question in Kabaddi is whether the defenders can 
ever force the attacker inside their convex hull, perhaps even by sacrificing some 
of their agents. The other games such as the cops-and-robber differ from kabaddi 
in the way capture occurs as well as the information about the evader's position. 
For instance, the current position of all the players is public information in kabaddi 
while the position of the robber or evader is often assumed to be unknown to cops 
or pursuers. Furthermore, it is also typically assumed that each cop (robot) follows 
a fixed trajectory that is known to the robber (rabbit). This makes sense in situations 
where the defenders (cops) have fixed patrol routes, but not in interactive games 
like kabaddi. The problems and results in the graph searching literature are also of 
a different nature than ours ll2l fTTIl . although variations using differential speed lH 
and capture from a distance |[3l have been considered in graphs as well. 

Finally, in the visibility-based pursuit-evasion games, the evader is often assumed 
to have infinite speed, and the capture is defined as being "seen" by some defender — 
both infinite visibility or limited-range visibility models have been considered fSJIHl. 
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By contrast, kabaddi involves equal speed agents and requires a physical capture that 
leads to a very different set of strategies and game outcomes. With this background, 
let us now formalize our model of kabaddi. 



1.1 The Standard Model 

We consider a discrete version of the game, in which both time and space are dis- 
crete: the players take alternating turns, and move in discrete steps. In particular, 
the game is played on a « x n grid S, whose cells are identified as tuples (i, j), with 
i,y e {1,2, .. . ,«}. We will mainly use the Kabaddi terminology, namely, attacker 
and defenders, with the former playing the role of the evader and the latter the pur- 
suers. Our main analysis will focus on the case of one attacker and two defenders, 
although in the latter part of the paper, we do derive some results for the case of d 
defenders, for any d> \. 

We use the letters A and D to denote the attacker and a defender, respectively. 
When there are multiple defenders, we use subscripts such as Di,D2, etc. We need 
the concepts of neighborhood, moves, and capture to complete the description of the 
game. Throughout, we assume that precisely two defenders are required to capture 
the attacker. 

Neighborhood. The neighborhood N{p) of a cell p = (/, j) is the set of (at most) 
9 cells, including p itself, adjacent to p, or equivalently the set of all cells with L^, 
distance at most 1 from p. In Figure[Tl the neighborhood of A is shown with a box 
around it. Slightly abusing the notation, we will sometimes write N{A) or N{D) to 
denote the neighborhood of the current position of A or D. 
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Fig. 1 The standard model of kabaddi. A can capture the defender closer to it, which is inside 
N{A). The defenders can capture A at any position in the shaded region, which is the common 
intersection of their neighborhoods. 



Moves. The attacker and the defenders take turns making their moves, with the 
attacker moving first. In one step, the attacker and the defenders can move to any 
cell in their neighborhood. All the defenders can move simultaneously in one step. 

Capture. A captures a defender/) if it is the unique defender lying inside the neigh- 
borhood of A. That is, with two defenders, D\ is captured when D\ G N{A) and 
D2 ^ N{A). (Notice that A only needs to enclose a defender within its neighborhood 
to capture it.) 
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Conversely, the defenders capture the attacker, when A Hes in the common inter- 
section of the two defenders' neighborhoods. That is, A G {N{Di) nN{D2))- 

Game Outcome. The attacker wins the game if he can capture one or more defend- 
ers, and the defenders win the game if they can capture the attacker. If neither side 
wins, then the game is a tie. 

This particular form of capture has a tendency to make defenders always stick 
together, and fails to model the real world phenomenon where defenders try to "sur- 
round" the attacker — see figure above. We therefore introduce a minimum separa- 
tion condition on the defenders in the following way: 

no defender can be inside the neighborhood of another defender 

These rules together define our standard model of kabaddi. Other models can be 
obtained by varying the definition of the neighborhood and relaxing the separation 
condition for defenders, and we obtain some results to highlight the impact of these 
modeling variables. 

Safe Return and Holding of the Breadth. In Kabaddi, the attacker must hold his 
breath during the attack, and after the attack successfully return to his side. These are 
non-trivial issues to model tractably, and we exclude them from our current model, 
instead relying on the following interpretation: the worst-case number of moves 
before the game's outcome serves as a proxy for the breath, and the attacker can 
conservatively decide at some point to return to his side. However, if this duration is 
known to the defenders, then they can attempt to interfere with his return. We leave 
these interesting, but complicated, issues for future work. One could argue that these 
issues are not important in the multiagent pursuit-evasion problem. 

1.2 Our Results 

In the case of a single attacker A against a single defender D, the game resembles the 
discrete version of the man-and-the-lion. We include a simple analysis of this case 
for two reasons: first, it serves as a building block for the multi-defender game; and 
second it allows us to highlight the impact of player's speed on the game outcome, 
which we believe is a new direction in pursuit evasion problems. Unsurprisingly, 
in the single defender case, the attacker can always capture the defender D in 0{n) 
number of steps, which is clearly optimal, upto a constant factor, in the worst-case. 

We show that a speed of 1 +©(!/«) is both necessary and sufficient for the de- 
fender to indefinitely evade the attacker. In particular, a defender with the maximum 
speed l+5/(| — 3) can evade the attacker indefinitely, but a defender with the max- 
imum speed of 1 + l/« can be captured. 

The game becomes more challenging to analyze with two defenders, where the 
attacker continuously runs the risk of being captured himself, or have the defenders 
evade him forever. Our main result is to show that the attacker has a winning strategy 
in worst-case 0{n) moves. One important aspect of the standard model is the sepa- 
ration requirement for the defenders — each must remain outside the neighborhood 
of the other. Without this restriction, we show that the two defenders, whom we call 
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strong defenders to distinguish from the standard ones, can force a draw: neither the 
attacker nor a defender can be captured. A further modification of the model, which 
disallows the diagonal moves, tips the scale further in the favor of strong defenders, 
allowing them to capture the attacker in 0{n^) steps. 

Extending the analysis to more than two players is a topic for ongoing and future 
work, and seems non-trivial. Surprisingly, for the standard model, it is not obvious 
that even Q{n) defenders can capture the attacker, nor it is obvious that the attacker 
can win against k defenders, for k>2. (The definition of capture remains the same: 
two defenders are enough to capture the attacker.) 

However, if we endow the agents with different speeds, then we can obtain some 
interesting results, as in the case of the single defender mentioned earlier. In partic- 
ular, if the attacker can make min{10,(i— 1} single steps in one move, then it can 
avoid capture indefinitely against d defenders, and if it can make min{ 11, d} steps 
per move, then it can capture all d defenders in time 0{dn). Thus, the attacker has 
a winning or non-losing strategy with 0(1) speed against an unbounded number of 
players, assuming a safe initial position. 

2 One on One Kabaddi 

We begin with the simple case of the attacker playing against a single defender. 
Besides being of interest in its own right, it also serves as building block for the 
more complex game against two defenders. We show that in this case the attacker 
always has a winning strategy in 0{n) moves. 

Throughout the paper, we assume that the grid is aligned with the axes, and use 
Ax = \Dx —Ax\ and Ay ~ \Dy ~Ay\, resp., for the x (horizontal) and the y (vertical) 
distance between A and D. 

Theorem 1. The attacker can always capture a single defender in a n x n game of 
kabaddi in 0{n) moves. 

Proof. The attacker's basic strategy is to chase the defender towards a wall, keeping 
him trapped inside a continuously shrinking rectangular region. Specifically, as long 
as minjzijc, z\y} > on its move, the attacker makes the (unique) diagonal move 
towards the defender, reducing both Ax and Ay by one. Because the grid is « x n, 
the attacker can make at most n such moves before either Ax or Ay becomes zero. 
Without loss of generality, suppose Ax — 0. From now on, the attacker always moves 
to maintain Ax — while reducing Ay by one in each move. Because Ay can be 
initially at most n, the attacker can reduce to it one in at most « — 1 moves, at which 
point it has successfully captured the defender because both Ax and Ay are at most 
1 . This completes the proof. 



3 Attacker against Two Defenders 

The game is more complex to analyze against two defenders. We begin by isolating 
some necessary conditions for the game to terminate, or for the next move to be 
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safe. We then discuss the high level strategy for the attacker, and show that it can 
pursue the defenders using that strategy without being captured itself. Together with 
a bound for the duration of the pursuit, this yields our main result of 0{n) steps win 
for A in the standard model. We denote the two defenders by Di and D2, and use D 
to refer to a non-specific defender when needed. Throughout the game, we ensure 
that whenever A makes a move, it is safe in the sense that it cannot be captured by 
the defenders in their next move. 

Lemma 1. On A's turn, if maxjzix, 4}'} < 2 for at least one of the defenders, 
then A can capture a defender in one step. Conversely, on the defenders' turn, if 
maxjzijt, Ay} > 2 for one of the defenders, then they cannot capture A on their 
move. 

Proof. We first observe that neither defender can be inside the neighborhood of A, 
namely, N{A). This holds because a single defender inside N{A) must have been 
captured in A's last move and if both the defenders are inside A' (A), then they would 
have captured A in their last move. Thus, we must have maxjzix, Ay} > 2 for both 
the defenders. 

Let D\ be the defender that satisfies the conditions of the lemma, meaning that 
max{/\jt, Ay} = 2. If both the defenders satisfy the condition, then let us choose 
the one for which Ax + Ay is smaller; in case of a tie, choose arbitrarily. Without 
loss of generality, assume that Dj lies in the upper-left quadrant from A's position 
(i.e. north-west of A). We now argue that A can always capture Di as follows. See 
Figure [21 



ID 

_^JL__ __JL 

D I A A A 



(a) 



(b) 



(c) 



Fig. 2 Illustrating the three cases in Lenima[Tl Ax^Ay = 2 |(a)| 3 [(b)l and 4 |(c)| The shaded 
area is the region that cannot contain the second defender. 



If Ax + Ay = 2, then we must have either Ax — 2,Ay — 0orAx^0,Ay — 2.ln the 



former case, A can capture Dj by moving to its x-neighbor (shown in Figure[2 |(a)] ), 
and in the latter by moving to its y-neighbor. Since the second defender must lie 
outside N{A) UN{Di), this move cannot cause A to be captured. Similarly, if Ax + 
Ay = 3 (shown Figure|2 |(b^ , then we have either Ax = 2, 4);= 1, or Ax= l,Ay = 2. 
In both cases, A captures Z)i by moving to its north-west neighbor {A^ — l,Ay+ 1). 
Observe that, by the minimum separation rule, if there is a defender at (A, — 2, A, + 
1), then there can't be one at (Ax — l,Ay + 2), and vice versa ensuring the safety 
of this move — there also cannot be a defender at {Ax,Ay + 2) because that would 
contradict the choice of the closest defender by distance. 
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Finally, if Ax + Ay — 4 (shown Figure |2 |(c)l ), then A captures Di by moving to 
(Av — 1 , Ay + 1 ) . This is a safe move because the only position for D2 that can capture 
A is at {Ax,Ay + 2), but in that case D2 is the defender with the minimum value of 
Ax + Ay, contradicting our choice of the defender to capture. This completes the 
first claim of the lemma. For the converse, suppose that Ax > 2 for defender Dj. 
Then, after the defenders' move, A is still outside the neighborhood of Di , and so A 
is safe. This completes the proof. 

The attacker initiates its attack by first aligning itself with one of the defenders in 
either x or y coordinate, without being captured in the process. The following two 
technical lemmas establish this. 

Lemma 2. A can move to the boundary in 0{n) moves without being captured. 

Proof. By assumption, A is currently safe. We first check whether A can capture a 
defender in the next move: if so, he wins. Otherwise, by Lemma [TJ we must have 
that max{/\jt, ziy} > 2 for both Di and D2. The attacker A now (arbitrarily) chooses 
a defender, say, Di and moves so as to increase both its x and y distances to that 
defender by one — this is always possible unless A is already on the boundary. Be- 
cause this always maintains max{z\x, Ay} > 2 with respect to Di, by Lemma[Tl the 
defenders cannot capture A, and is A guaranteed to reach the boundary mO{n) steps. 

Lemma 3. By moving along the boundary, A can always force either Ax = or 
Ay = Ofor one of the defenders in 0{n) moves, without being captured. 

Proof. Without loss of generality, assume that A is on the bottom boundary, and 
that at least one of the defenders, say, Di lies in its upper-right quadrant (i.e. has 
larger x coordinate). Then, A's strategy is to always moves right on its turn, and is 
guaranteed to achieve Ax — with D\ at some point. We only need to show that A 
cannot be captured during this phase. But if A were captured at position ((,0), then 
the defenders must be at positions (; — 1,71) and (;+ 1,72), for 71,72 £ {0,1} — 
these are the only positions whose neighborhoods contain the cell (;', 0) in common. 
However, the position of A one move earlier was (/ — 1,0), so the first defender 
would necessarily satisfy the conditions of Lemma[T]and would have been captured 
by A already. 

3.1 The Second Phase of the Attack 

Having reached the starting position for this second phase of the game, we assume 
without loss of generality that A is at the bottom boundary, and that after A's last 
move. Ax = for one of the defenders. From now on, A will always ensure that 
Ax < 1 for one of the defenders after each of A 's moves. The x-distance can become 
Ax — 2 after the defenders ' move but A will always reduce it to 1 in its next move. 
By Lemma [T] if both Ax and Ay are at most 2, then A can win the game. On 
the other hand, if the players are too far apart, then both sides are safe for the next 
move. Thus, all the complexity of the game arises when the distance between A and 
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Fig. 3 Proofs of Lemmas |4](a)|and[51(b) 



the defenders is 3, requiring careful and strategic moves by both sides. We show 
that A can always follow an attack strategy that ensures a win in 0{n) steps, while 
avoiding capture along the way. 

In order to measure the progress towards A's win, we use the distance from A's 
current position to the top boundary of the grid while ensuring that Ax<l continues 
to hold. In particular, define <1>{A) as the gap between the current^ position of A and 
the top boundary. That is, <X>{A) = [n —Ay), where this gap is exactly n—\ when the 
second phase begins with A on the bottom boundary. We say that A makes progress 
if (t>{A) shrinks by at least 1 , while Ax remains at most 1 for some defender. Clearly, 
when the 4>(A) reaches one, A has a guaranteed win (by Lemma[T]l- If the attacker 
succeeds in capturing a defender, then we consider that also progress for the attacker. 

The overall plan for our analysis is the following: 

1. If max{Ax, Ay} < 2 for at least one defender, then the attacker wins in one move 
(Lemma [T]l. If Ay > 3 for some defender, then A can move to reduce Ay by 
one, while keeping Ax < 1, and this move is safe by Lemma [T] Thus, the only 
interesting cases arise when Ay — 3; these are handled as follows. 

2. If Ay = 3 and Ax — for some defender, then Lemma |4] below shows that A 
makes progress in 0( 1 ) number of moves. 

3. If A V = 3 and Ax = 1 or 2 for some defender, then Lemmas |5] and |6] show that A 
can make progress in 0{n) number of moves. 

In the following, we use the notation A'^ {p) to denote the 2-neighborhood of a cell 
p, meaning all the positions that can be reached from p in two moves. 

Lemma 4. On A 's move, if Ay — 3 and Ax = holds for some defender, then A 
makes progress in one move. 



Proof. Figure [3l |(a)| illustrates the game configuration for this case, where the de- 
fender satisfying the distance condition Ax = 0,Ay ~3 is shown as D. There are 
three positions for A to advance and make progress, and they are marked as x in the 
figure — in each case, the y distance reduces by 1 , while Ax remains at most 1 . We 
only need to show that A can move to one of these positions without being captured 
itself. 
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In order to prove this, we observe that (1) neither defender is currently inside 
N^{A) because that is a winning configuration for A by Lemma [TJ (2) the second 
defender is not in N{D), as required by the separation rule for defenders. Thus, the 
second defender must be outside A^^ (A) [JN{D). But in order to foil A's move to all 
three x positions, the second defender must also be within the 2-neighborhood of all 
the X positions. That, however, is impossible, as is readily confirmed by inspection 
of Figure [3] |(a)[ Thus, A can safely move to one of the positions marked as x, and 
guarantee progress. We note that when A and D are on the boundary, there are two x 
positions instead of three, and in that case A can always move to the x directly north 
and make progress. 

Lemma 5. On A 's move, if Ay — 3 and Ax = 1 holds for some defender, then A 
makes progress in 0{n) number of moves. 



Proof. Figure [3] |(b)| illustrates the game configuration for this case, where the de- 
fender satisfying the distance condition Ax = I, Ay = 3 is shown as D. (We assume 
without loss of generality that D, = A, + 1 because the case D^ = A^ — 1 is entirely 
symmetric.) In this case, there are two positions marked x that allow A to make 
progress by reducing Ay. In order to foil A's move, the second defender must be 
positioned so as to cause A's capture at both these positions. Reasoning as in the 
previous lemma, however, D2 has to lie outside both N{D) as well as N^{A). It is 
easy to see that there is precisely one position forZ)2, shown as the shaded cell, that 
threatens A's capture at both the x positions. 

This is a case where A cannot ensure progress in a single step, and instead a 
multi-step argument is needed. In particular, A moves to its right neighboring cell, 
at location (Av + 1,A,,), which does not improve ^(A), but we show that 0(A) 
will improve in 0{n) steps. Consider the next move of the defenders. The defender 
labeled D must move to a cell within N{D), and we analyze the progress by A as 
follows: (i) if D moves up, making its distance from A equal to Ay — 4, then the next 
move of A makes a guaranteed progress by moving to make Ay = 3 and Ax < 1 . This 
move is safe for A by Lemma[T] (ii) if D moves down, making its distance from A 
equal to Ay — 2, then, A has a guaranteed win according to Lemma[T] (iii) if D stays 
in its current cell, then we have Ay = 3 and Ax = on A's move, for which Lemma|4] 
guarantees progress in one move. 

Thus, the only interesting cases are if D moves to its left or right neighbor. If D 
moves left, causing Ay = 3 and Ax — 1, then A can immediately make progress be- 
cause both the defenders are on the left side of A's position (recall that A was forced 
to make a move without progress because the second defender was in the shaded 
cell), and so A can safely move diagonally to reduce both Ax and Ay distances to D. 
In this case we have progress in a total of 2 moves. 

On the other hand, if D moves to its right neighbor, then the situation of impasse 
can persist, because both positions marked x where A can make progress can cause A 
to be captured. This forces A to continue to mimic D's rightward move by moving to 
its right neighbor. However, this impasse can continue only for 0{n) moves because 
as soon as D reached the right boundary of the field, he is forced to move up, down, 
or left, giving A a chance to make progress. This completes the proof of the lemma. 
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Lemma 6. If Ay = 3 and Ax = 2 for some defender say Di then A may make 
progress in 0{n) moves. 

Proof. The proof is similar to the proof of Lemma [5j and omitted due to lack of 
space. 

3.2 Completing the Analysis 

We can now state our main theorem. 

Theorem 2. In the standard model of kabaddi on a n x n grid, the attacker can 
capture both the defenders in 0(n) worst-case moves. 

Proof. We show that, starting from an initial safe position, the attacker always has 
a move that keep him safe for the next move of the defenders, and that after 0{n) 
moves the attacker can place itself on a boundary with either Ax — or Ay ~ for 
some defender. Without loss of generality, suppose the attacker reaches the bottom 
boundary, with Ax = (Lemmas [21 E]). In the rest of the game, the attacker always 
maintains Ax < 1 after each of its moves. The attacker's next move is described as 
follows: 

1. If niax{Ax,Ay} < 2 for some defender, then the attacker can capture a defender 
in 1 move (Lemma[Tl and the remaining defender in 0{n) moves. 

2. If Ay > 4, then the attacker always moves to reduce Ay and Ax by one, unless 
Ax is already zero. 

3. If Ay = 3, then depending on whether Ax = 0, 1 or 2, the attacker's strategy is 
given by Lemma |4l|5] or |6l respectively. 

These cases exhaust all the possibilities, and as argued earlier, the attacker can re- 
duce <&(A) by one in 0{n) moves. Since the maximum possible value of ^(A) is 
initially n — 1, and it monotonically decreases, we must reach ^(A) = 1 in worst- 
case 0{n^) moves, terminating in a win by A. 

We now argue that the 0{n^) bound is pessimistic and that 0{n) moves suffice. 
The key idea is that once the attacker forces Ax — 0, it only moves to the three cells 
above it and the one to its right. The three upward moves clearly cause progress, so 
we only need to argue that the rightward moves happen 0{n) times. This follows 
because the grid has width n, and therefore after at most n — 1 rightward moves, 
every additional rightward move must be preceded by some leftward move. Since 
the attacker always moves upward in its left-directed moves, it makes progress in 
each of those moves. Then due to the fact A only needs n — 2 upward moves, there 
can be at most 2m — 3 right moves (the initial n — 1 moves plus the n — 2 moves 
corresponding to upward moves), and thus at worst 3n — 5 total moves. Thus the 
attacker captures both defenders in 0{n) worst case moves. This completes the proof 
of the theorem. 
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4 Strong Defenders 

In the standard model, each defender must remain outside the neighborhood of other 
defenders; that is, D, ^ N{Dj), for all i,j. The defenders become more powerful 
when this requirement is taken away. Let us call these stronger defenders. In this 
case we explore what happens when we remove the stipulation that the defenders 
cannot be within each other's neighborhoods. This creates two stronger defenders 
and as a result creates a game where ideal play means not only can the attacker 
not win, but the defenders cannot either. We assume that play starts with defenders 
already in a side-by-side position, that is. Ax + Ay— 1 with respect to Di and £)2's 
coordinates. 

Theorem 3. Under the strong model of defenders, there is a strategy for the defend- 
ers to avoid capture forever At the same time, the attacker also has a strategy to 
avoid capture. 

Proof. We first argue that the attacker can evade capture. Suppose that the defenders 
were to capture A in their next move. If neither defender is inside N'^{A), then A is 
clearly safe in its current position for the defenders' next move, so at least one of 
the defenders, say Di is inside A^^ (A). Unless D2 G N{Di), by Lemma[T] then A can 
capture Di in its next move. Thus, Di,D2 must be adjacent, namely, in each other's 
1 -neighborhoods. 

We now argue that all defender positions from which they can capture A in the 
next move are unsafe, meaning the attacker can capture one of the defenders in its 
current move. There are only two canonical positions for the defenders with one or 
both of them in the outer cells of N^{A): either side-by-side, or diagonal from one 
another. In the first case, the defenders only threaten the cells in front of them but 
not those that are diagonal, so A can move to one of those diagonal spaces. In the 
second case, A can capture by moving to any space diagonal from a defender. 

Similarly, we can show that defenders can also avoid capture. Figure |4] shows a 
representative situation just before the defenders' move. Suppose that the attacker 
were to capture one of the defenders in its next move. We claim that the cells marked 
as A in the figure are the only places (upto symmetry) for the attacker's current 
position — i.e. these are the positions where A is not captured currently but can cap- 
ture a defender by moving to the cells shown shaded. This is found by taking the 
union of the 1 -neighborhoods of the three shaded spots (the only places A captures 
a defender) to find all possible places A may move to capture from, then removing 
all those that the defenders could capture. This result in a list of spots D cannot 
capture but must avoid capture from. However, the defenders can avoid this capture 
by simply "flipping" their orientation, as shown by arrows in the figure. Notice that 
after the flip the attacker now cannot capture with its move. Also the flip does not 
rely on the position of the boundary, as the defenders move up only if the attacker 
is above them, and move down only if the attacker is below them. Thus this can be 
performed regardless of location. 
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Fig. 4 Illustrates Theorem |3] 

5 Strong Defenders with Manhattan Moves 

Thus, in the standard model but with strong defenders, we have a tie, and neither 
side can guarantee a win. In the following, we show that if we disallow the diagonal 
moves, permitting a player to move only to its left, right, up, and down neighbors, 
then the defenders have a winning strategy. That is, the movement metric is Man- 
hattan metric — a player can only move to a cell within the Li distance of 1 from 
its current cell. The definition of the capture, however, remains the same as in the 
standard model. Due to lack of space, the proof of the following theorem is omitted 
from this extended abstract. 

Theorem 4. Two strong defenders playing under the Manhattan moves model can 
always capture the attacker in 0{n ) moves. 



6 Differential Speed Pursuit Evasion 

So far, we have assumed that all players have the same (unit) speed. While we are 
unable to resolve the outcome of these games when the attacker plays against more 
than two defenders, we show below that differential speed leads to some interesting 
results. We model the speed as the number of unit-step moves a player can make 
on its turn — each step is the same elementary move used in the standard model. In 
particular, on its turn, a player with speed s can repeat the following s times, starting 
at a cell p — po: 

move to any cell p' G N{p), and set p — p'- 

We allow the speed to be any rational number. Thus, a player with movement speed 
5 + - can make s unit step moves on each turn plus it can make s+\ steps on 
every [q/p\ th turn. Please note that this definition is not the same as being able to 
move to a cell at distance at most s — specifically, our attacker has a chance to visit, 
and possibly capture, s defenders in a single move. However, during his turn, if the 
attacker is ever in the common intersection of two defenders' neighborhoods, then 
it is captured (as in the standard model). 

We first consider the minimum speed advantage needed by a single defender to 
escape the attacker forever. 
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6.1 One on One Game with Speedier Defender 

The following theorem shows that a speed of 1 + 1 /n is not enough for the single 
defender to evade capture by the attacker. 

Theorem 5. A defender with maximum speed 1 + - can be captured in 0{n) moves 
by an unit-speed attacker on the n x n grid. 

Proof. The attacker's strategy is the same as in Theorem [T] We simply observe 
that despite the speed disadvantage the attacker still reduces either Ax or Ay to zero 
within n moves. Without loss of generality, assume that Ax becomes zero. After that, 
the attacker can also enforce Ay = within n moves. In these n moves, the defender 
gains only one extra move, which only increases Ax to 1 , but is still sufficient for 
the capture. Thus the defender is captured in 0{n) moves by the attacker. 

Surprisingly, it turns out that a speed of 1 + 0(1/m) suffices for the defender to 
escape, as shown in the following theorem. 

Theorem 6. A defender with maximum speed 1+5/(^ — 3) can indefinitely evade 
the attacker on an nx n grid. 

Proof. Assume an initial placement of the two agents in which (1) the defender D 
is at least distance n/4 from its closest boundary, which we assume to be the bottom 
boundary, (2) A is distance | + 3 from the same boundary, and (3) Ax + Ay — 3. (The 
defender can easily enforce the condition Ax = 0, and the remaining conditions are 
to achieve a safe initial separation between the attacker and the defender.) We argue 
that the defender can successfully maintain these conditions, and when needed use 
its extra moves to reestablish them with respect to a different boundary. 

The defender's strategy now is to simply mimic the moves of the attacker as long 
as it can do this without running into a boundary. During these moves, the defender 
is safe because of the condition Ay = 3 or Ax — 3 (cf. Lemma[Tll. 

Since the defender D is at least n/4 away from the boundary that is opposite the 
attacker, its speed advantage guarantees it 5 extra steps before it can no longer mimic 
a move of the attacker — which can only happen due to running into a boundary. We 
now assert that the 5 extra moves are sufficient for D to reestablish the starting 
conditions without being captured. This is illustrated in Figure [5l |(a)[ where only 
a small portion of the grid surrounding the players is shown for clarity. With its 5 
moves (shown labeled 1 , 2, . . . , 5), the defender is able to restore the initial condition 
with respect to the right boundary. During this maneuver, the defender maintains a 
safe distance from A, and therefore is not captured. 

Of course, the defender earns its five extra moves gradually, and not at once, but 
it is easy to see that the defender can plan and execute these extra moves (amortize, 
so to speak) during the at least n/4 moves it makes mimicking A, as it earns them. 
In particular, D always "rotates" around the attacker in the direction of the farther 
of the two boundaries, which must be at distance at least n/2. D cannot run into a 
boundary because the closest one is at least n/4 away and it completes its rotation 
in J — 3 turns, during which the 5 extra moves will never decrease the defender's 
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Fig. 5 Figure |(a)| illustrates the proof of Theorem |6l the defender uses 5 extra moves to 
reestablish the initial conditions. Figure [(b)] illustrates Theorem [7] the attacker can capture 
seven of the maximum possible eight defenders using 9 steps, and return to its original posi- 
tion in the 10th step. 

distance to that boundary. The new target boundary is at least «/2 away and once 
the attacker finishes its rotation must still be n/4 away. This is because there are 
at most 5 — 3 moves in this direction resulting from moves mimicking A and the 
three additional moves from the rotation. Thus, after the rotation, the defender is 
n/4 away from a boundary and the attacker is | + 3 from the same boundary, with A 
and D both in the same row or column. Thus the defender can continue this strategy 
forever and avoid capture. 



6.2 Speedier Attacker against Multiple Defenders 

We now consider the speed advantage of attacker against multiple defenders. We 
showed earlier that in the standard model, the unit-speed attacker wins against two 
unit-speed defenders. However, the game against more than two defenders remains 
unsolved. In the following we show that with a constant factor speed advantage, a 
single attacker can win against any number of defenders. 

Theorem 7. An attacker with speed s can indefinitely avoid capture against s -\- 1 
defenders, for s < 10. An attacker with speed .? = 10 can avoid capture against any 
number d of defenders. 

Proof. Let us first consider i' < 10. The attacker follows a lazy strategy, which is to 
sit idly unless it is in danger of being captured in the defenders' next turn. Specif- 
ically, if no defenders are in N^[A), the attacker is safe (by Lemma [T]l. If some 
defenders enter A^^ (A), then the attacker can capture the defender closest to it using 
Lemma [T] in a single elementary step, with s —\ steps (and at most s defenders) 
remaining before his turn is up. We repeat the argument from the new location of 
A, until either A is safe for the next turn of the defenders, or it has captured all but 
one defenders. Thus, either A can remain safe indefinitely, or if only one defender 
remains it can win. 

When s> 10, we note that due to the minimum separation constraint among the 
defenders, at most 8 defenders can simultaneously exist inside A^^(A) — clearly, no 
defender lies in N{A) because that is already a captured position, and there are 16 
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cells in N^{A) — N{A), and no two consecutive ones can have defenders in them. 
Figure [5] |(b)| shows A's strategy to capture seven of the maximum possible eights 
defenders in nine steps, and then return to its original position in the 10th step. It is 
easy to check that the attacker can achieve a similar result for any configuration of 
fewer than eight defenders. 

The following theorem, whose proof is omitted due to lack of space, shows that an 
additional increase of speed allows the attacker to capture, and not just evade, any 
number of defenders. 

Theorem 8. An attacker with speed s < \Q can capture s or fewer defenders in 
0{sn) turns. An attacker with speed s = \\ can capture any number d of defenders 
in 0{dn) turns. 



7 Discussion 

We considered a pursuit-evasion game in which two pursuers are required to cap- 
ture an evader. We modeled this game after Kabaddi, which introduces a new and 
challenging game of physical capture for mathematical analysis. We believe that 
Kabaddi offers an elegant and useful framework for studying attack and defensive 
moves against a team of opponents who can strategically coordinate their counter- 
attacks. Our analysis shows that even with two defenders the game reveals signifi- 
cant complexity and richness. 

Our work poses as many open questions as it answers. Clearly, in order to ob- 
tain our initial results, we have made several simplification in the game of Kabaddi. 
While these simplifications do not affect the relevance of our results to multiagent 
pursuit-capture, they are crucial for a proper study of kabaddi. The most signifi- 
cant among them is the proper modeling of "holding the breadth" and "safe return." 
Among the more technical questions, analyzing the game for more then two defend- 
ers remains open in the standard model. The minimum separation rule leads to some 
pesky modeling problems because the attacker could sit in a corner cell and not be 
captured. So some modification is needed in the rules to avoid such deadlocks. Fi- 
nally, we have not addressed the game when more than two defenders are required 
for the capture. 
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Reconfiguring Chain-iype Modular Robots 
Based on the Carpenter's Rule Theorem 

Jungwon Seo, Steven Gray, Vijay Kumar, and Mark Yim 



Abstract. Reconfiguring chain-type modular robots has been considered a difficult 
problem scaling poorly with increasing numbers of modules. We address the re- 
configuration problem for robots in 2D by presenting centralized and decentralized 
algorithms based on the Carpenter's Rule Theorem (4]. The theorem guarantees 
the existence of instantaneous collision-free unfolding motions which monotoni- 
cally increase the distance between all joint pairs until an open chain is straightened 
or a closed chain is convexified. The motions can be found by solving a convex 
program. Compared to the centralized version, the decentralized algorithm utilizes 
local proximity sensing and limited communications between subsets of nearby 
modules. Because the decentralized version reduces the number of joint pairs con- 
sidered in each convex optimization, it is a practical solution for large number of 
modules. 



1 Introduction 

Forming shapes from groups of robotic modules is a goal for many Modular 
Self-reconfigurable Robots (MSRs) and Self-assembling Structures. Such approaches 
often utilize modules with nice space-filling properties ll9l [23llT9ll . The modules re- 
arrange themselves to form shapes that suit the task at hand Il23l [8ll7l. In addition 
to the mechanical issues inherent in building a system that has a desired shape and 
bonding mechanisms, research has focused on motion planning for these modules. 
The problem in this context is to determine collision-free motions for the modules 
to rearrange from an initial configuration to a goal configuration. 

There are three classes of MSRs based on the style of reconfiguration: chain, 
lattice, and mobile Il22ll . Chain reconfiguration involves forming chains of arbitrary 
numbers of modules ET\ [TSl which may break apart, combine into larger chains. 
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or join face-to-face. This class of reconfiguration involves long chains with up to n 
degrees-of-freedom (« is the number of modules); planning for chains requires self- 
collision detection as well as forward and inverse kinematic computations which 
scale poorly as n increases. Randomized path planning techniques have been applied 
to this type of reconfiguration i2l [T6ll . Lattice reconfiguration involves modules that 
sit on a lattice while other modules move around each other to neighboring lattice 
positions. Moving modules from one location to another has been well-addressed 
in the literature HJ] 1121 El- Lastly, mobile reconfiguration uses the environment to 
maneuver and has primarily been explored in stochastic fluidic systems ITSl . 

Modules that are permanently connected by joints can be folded to form rela- 
tively strong structures, as permanent joints can be made stronger than bonds that 
must be able connect and disconnect. Such modules are useful for applications in- 
volving large internal forces (e.g., a reconfigurable wrench). Achieving desired me- 
chanical properties for shapes like the wrench is a goal of programmable matter 
11241 . While one might suspect that requiring modules to maintain a permanently 
connected chain would limit the possible configurations, it has been shown that any 
2D shape can be formed by folding a sufficiently long chain of diagonally connected 
squares |l8l|9l. In three dimensions, origami demonstrates the versatility of perma- 
nently connected folded shapes. Robotic folded sheets have been shown in ifTOl lHll. 
Whereas origami uses uncut sheets of flat material, this work focuses on module 
chain that can be folded into larger structures. 

2 Preliminaries 

2.1 Carpenter's Rule Theorem and the CDR Algorithm 

Consider a linkage of rigid line segments connected at revolute joints to form an 
open chain or a closed chain on the plane. The Carpenter's Rule Theorem states that 
every simple open chain can be straightened and every simple closed chain can be 
convexified in such a way that the links do not cross or touch ||4|. 

Let p = (pf P2 • • • Pn ) ^ denote a configuration of a simple chain of n joints 
by specifying joint coordinates in the plane, p, = (/?,> PnY ■ An example is shown 
in Fig. |l(a)[ For open chains, pi and p„ refer to the two unary joints at the ends. For 
closed chains, n-joints correspond to n-vertices of the simple n-gon. The configura- 
tion space P is defined as a collection of all such configurations. Thus, when joints 
p g P are connected in order, the chain is neither self-touching nor self-crossing. 
Note that we will factor out rigid transformations by pinning down a link. 

We now summarize the result by Connelly, Demaine, and Rote {The CDR Algo- 
rithm) Q. Assume that none of the joint angles (the smaller of two angles at p,) is n. 
Consider the following convex program with respect to V = (vj^ y^ ■■■ yJiY , 
where v, = (v,x v/y)^ is the instantaneous velocity of p,. 
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Fig. 1 (a) An open chain of line segments (bars) with 14 joints. Dashed lines represent some 
of the struts connected to joint 1 . (b) An open chain of rigid bodies. Each base link (black line 
segment) has slender adornments except for the last link. For example, when A traverses the 
adornment boundary from p4 to p5, the distance between p4 (P5) and A increases (decreases) 
monotonically. The distance between pv (pg) and B, however, does not increase (decrease) 
monotorrically. 



minimize 



Ellv,i 
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The set S original of struts is a collection of all joint pairs {(,7} not connected to the 
same rigid bar and the set Bgriginai of bars contains only joint pairs attached to the 
same rigid bar. Since (y j — v,) • (py — p,) can be related to the time rate of change of 
1 1 p j — p, 1 1 , the above formulation asks if one can find an instantaneous motion where 
every joint p^t moves away from all other joints except for Pk-\ and Pi+i, the joints 
connected to p^t by rigid bars. The Carpenter's Rule Theorem proves that this convex 
program always has a feasible solution until any joint angle reaches n. In other 
words, the theorem verifies the existence of instantaneous collision-free unfolding 
(straightening or convexifying) motions. (2) shows that stronger constraints where 
we expand struts at a rate of at least unity are also feasible. (Consider (v^ — v,) • 

lip--*'' II •-* ^^^ motion predicted by above formulation is called a global-scale strictly 
expansive motion. 

Moreover, the solution to Eqns. (1 - 4), /(p), is differentiable in a neighborhood 
Q of p, where 2 is a collection of configurations around p in which any joint angle 
is not n. Thus an unfolding path p(f) can be obtained by solving the dynamical 
system, p = v = /(p). Whenever any joint angle reaches n, the two adjacent links 
must be merged and a new unfolding path for the modified linkage must be obtained 
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at that time. Finally, given a configuration p^, we get p(f ) : [0, T] C M — > P such that 
p(0) = Pt and p(r) = po, where po is the straightened or convexified configuration. 

2.2 Slender Adornment 

Extending the Carpenter's Rule Theorem to linkages of rigid bodies requires a guar- 
antee of universal foldability. A family of planar shapes referred to in the literature 
as having the property of slender adornment |5| provides this guarantee. Slender 
adornment is defined such that the distance between each joint and a point moving 
along the exterior of the rigid body link changes monotonically (Fig. |l(b)l l. 

According to \5\, we can regard each slender link as a line segment {base) con- 
necting two revolute joints such that the whole chain system can be treated simply 
as a mathematical linkage {base linkage) for finding an unfolding path using global- 
scale strictly expansive motions. 



3 Problem Description and Main Contribution 

We shall investigate how to reconfigure a chain-type, modular self-reconfigurable 
robot moving on a plane to move between any two shapes while maintaining con- 
nections between modules and avoiding collisions. The system can be thought of as 
a serial chain (open or closed) of bodies connected by revolute joints. Joint connec- 
tivities are fixed to facilitate development of physical prototypes. 

Specifically, we shall focus on chains of either line segments or cubes (squares 
in 2D). The former can abstract many useful systems such as robot manipulators, 
and the latter is particularly interesting for its space-filling property which enables 
us to represent interesting 2D shapes. As used in the previous section, unfolding 
straightens or convexifies complicated shapes while folding "complicates" shapes. 
We reconfigure by repeating folding and unfolding. 

The main contribution of this paper is the decentralized reconfiguration planning 
based on the Carpenter's Rule Theorem lllQ. Our result allows decoupled planning 
for a class of articulated robot, something long considered infeasible due to interde- 
pendencies of the motions (page 390 in IfTTl ). Compared to the current state-of-the- 
art in planning methods, for example, probabilistic algorithms, our algorithms do 
not need to build a roadmap a priori nor do they need specialized parametrization 
to handle closed loops lfT6ll20l . The formalism from Sec.|2]naturally handles closed 
loops. For example, in Fig. |l(a)| a closed chain can be easily modeled by adding a 
bar between links 1 and 14. The bar is added by adding joint pair {1, 14} to Borigimii 
and deleting the pair from S original- The need to check a randomly generated con- 
figuration for collisions, often the most costly step for a probabilistic algorithm, 
is eliminated. Furthermore, a collision-free path is guaranteed without needing to 
consider the specific module shape. Although we focus on two body types (line seg- 
ments and cubes), our methods can be readily applied to any slender body shape. 
There are some disadvantages as well: the need for slender adornment can be viewed 
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as a constraint and the current formulation cannot handle external obstacles, unless 
used in conjunction with other planning methods. 

We will mainly discuss unfolding motions. Folding motions are obtained by re- 
versing unfolding motions. Thus, unfolding motions can be implemented online, 
whereas folding motions have to be computed prior to actuation. 



4 Modeling Modular Robot Chains 
4.1 Modules 

A module is defined as a pair of links connected by a joint. The /* module will have 
a joint /, "left" link ii, and "right" link Ir (see Fig. |2]l. Observe that half modules 
(including a unary joint) are attached to the ends of an open chain and two modules 
are connected rigidly. Sometimes it may be necessary to fix some modules when 
their joint angle reaches n during unfolding. By abuse of terminology, the meta 
structure will be simply called a linkage where a link can have more than one module 
due to fixing. An example is shown in Fig.|2l 



Fig. 2 A chain-type modu- 
lar robot with cube bodies. 
Each module is delimited by 
dashed lines. The ;* module 
has two half bodies (ii and 
Ir). If we assume that the 
robot is now unfolding it- 
self, joints 6 and 8 are fixed 
so there are two long links 
P5P7 and P7P9. 




Pi p 



Definition 1. The predecessor of a module i, ¥R(i), is the closest unfixed module on 
the lefthand side ofi. Similarly, its successor, SU(/j, is defined as the closest unfixed 
module on its righthand side. The argument can contain a subscript, L or R. For 
example, PR(5) = 4 and PR(5r) = 5 in Fig.^ 



4.2 Representing Self-touching Shapes 

We need to consider how to represent various shapes in a module configuration p. 
If there is neither self-crossing between modules (which can be rigid bodies) nor 
self-touching between bases (Sec. 12.21 ). then the shape can be represented using p. 
A more interesting application, however, may contain self-touchings between bases 
such as filling a region with a cube chain. Assume that the region to be filled is 
depicted as a polyomino since we have square pixels. We first construct a spanning 
tree of the polyomino (Fig.|3]left) by finding a spanning tree of the graph G = {V,E) 
in which we treat each square as a vertex in V and each line segment shared by two 



no 



J. Seo et al. 




Fig. 3 A hexomino resembling a wrench head which is to be filled by an open chain of 24 
cubes. An additional parameter e was introduced to remove self-touchings. 



adjacent squares as an edge in E. By dividing the constituent pixels (or squares) 
into 4 sub-pixels [9] and connecting diagonals of each smaller pixel we can always 
construct a piecewise-linear and simply-connected path which zigzags around the 
spanning tree (Fig. [3] center). This path can be thought of as a desired configuration 
for the base linkage of a cube chain. This, however, introduces a problem since the 
Carpenter's Rule Theorem does not allow self-touching between bases. We intro- 
duce a positive nonzero parameter e to remove any self-touchings (Fig. [3] right). 

Other approximation methods may be used to remove self-touching configura- 
tions. Such methods, however, imply that the faces of a module in a real prototype 
must incorporate a measure of compliance; they cannot be perfectly rigid. 



5 Algorithms for Unfolding 
5.1 The Centralized Algorithm 

We shall directly apply the Carpenter's Rule theorem and centralized computation to 
obtain an unfolding motion. There exists one leader module which gathers position 
information from all modules, computes unfolding paths for them, and orders them 
to move. Theoretically, the leader module will solve p ~ /(p), where the righthand 
side is the solution to Eqns. (1-4) for the set A of active modules after fixing any 
modules of joint angle n to make them inactive. Note that two half-modules at the 
ends of an open chain (and their unary joint) are assumed to be active at any time. 
Recall that /(p) can be integrated to generate a smooth integral curve until any 
additional joint angle reaches n (type I cusp). The linkage is then simplified with 
one less link by fixing the module at n and a new integral curve will be attached to 
the previous one recursively. The centralized algorithm approximating the analytic 
integral curve follows in Table [T] An example can be found in |i8J. 

SOLVEGlobalProgramO solves (1 - 4) to find an instantaneous motion. LlN- 
EAR2Angular() converts linear velocity vectors v into angular velocities ft). For 
example, (O^ can be calculated from V4, V5, and vg in Fig. |l(a)| The current con- 
figuration is then updated by applying (O for At in ReC0NFIGURER0B0T(). For 
practical purposes, we assume small, finite step size At, although the theoretical re- 
sults hold when the step sizes are infinitesimal. Whenever the ;* joint angle reaches 
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Table 1 Centralized unfolding algorithm 

function Centralized-Unfolding returns unfolding motion 
input : p, an initial configuration to be unfolded 

while neither straight nor convex do 

V ^ SolveGlobalProgram(p) 
CO ^ Linear2Angular(v) 

p ^RECONFIGUREROBOT(a), At) 

if one joint = 71 then 

p ^ FlXMODULE(p) 

end if 
end while 

n, the module / will be fixed by simply being eliminated from database and setting 
a new bar between PR(/) and su(i) using FlxMODULE(). 

5.2 The Decentralized Algorithm 

In many cases, it is beneficial to lessen the burden on the leader module. Indeed, a set 
of modules in a local neighborhood can be defined through local proximity sensing 
and used to formulate a decentralized version of the reconfiguration planning. We 
shall show that local proximity sensing can relax the convex program in (1 - 4) 
leading to a new problem with fewer constraints. The decentralized algorithm can 
then be used to compute desirable infinitesimal motions to be combined to construct 
a piecewise smooth unfolding path p(/). Cusps in p(r), however, occur not only 
whenever any joint angle reaches n (type I cusp) but also whenever the proximity 
relationship changes (type II cusp) as the linkage explores its configuration space. 

5.2.1 Sensor Model 

At each joint p,- we will attach a proximity sensor with two radii describing con- 
centric circles centered at each sensor p, with radius rsR and rsR + 5, respectively. 
These two positive parameters will be compared to <^(Pi,y(.)), the minimal distance 
between a sensor at p,- and points on module y'j.), where the subscript can be L or R, 
in order for module / to identify its local neighborhood (Fig. |4(a)] l. For example, any 
yV.) with d{pi,ji.'j) < rsR will be declared as within /'s neighborhood. Later we will 
show the distributed algorithm prevents collisions between module / and its local 
neighborhood (Sec.|522H5231l- 

Finally, let module / gather the following information based on hysteretic behav- 
ior due to the double sensing boundary. 

Definition 2. A^,- is a set of half-modules comprising the neighborhood of i. Half- 
module ji:\ becomes a member of the neighborhood when d{pi,jf.\) < rgg and 
will remain a member of the neighborhood until d{pi,ji.\) > rsR + 5. As soon as 
d{pi,ji.\) = rsR + 5 for any j/.\ £ Nj, ji.\ is no longer a member of the 
neighborhood. 
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(a) 



(b) 



Fig. 4 (a) Sensor model for a line segment chain. Inner circles and outer circles have radius 
r^R and r^R + S, respectively. Red dashed lines starting from pg represent {6,1}, {6,2} e 



temp 



Blue dashed lines starting from pg are {6,4}, {6, 8} £ ^{fj fixed' Since module 5 has 



an empty local neighborhood, only {5, 3}, {5, 7} e 5^(0 /■,«<; are defined, (b) Sensor model for 
a cube chain. 



Intuitively, if / "sees" that half-module y'j.) is inside its inner sensing range, / will 
track 7(.) until it is totally out of sight beyond the outer sensing range. 



5.2.2 Relaxing Constraints and Local Motion 

Before formulating a decentralized algorithm, we shall consider here and in Sec. 
l5.2.3l how to guarantee the existence of an unfolding path under the local proximity 
sensing. It is convenient to assume yet again that a central processor is solving the 
existence problem until the decentralized version is discussed in Sec. 15.2.41 We shall 
begin with line segment chains where r^R can be very small because two mid-link 
points can never collide unless preceded by a joint-joint or joint-link collision. 

Proximity sensors will only be used on active modules because all collisions 
are involved with joints. Based on the sensing result at time t, we can define a set 
^i^Ytemn of struts as a collection of struts from / to predecessors or successors of its 
neighborhood where i E A. The basic idea is to define a set of temporary struts to 
address potential collisions when separation distances are below rsR (Fig. |4(a)] ). 



Sit)Lp = {{/,7}|7 e {PR(A:(.)),SU(/:(.))|VA:(.) G^JI 



(5) 



As Pi gets farther from PR(A:(.)) and SU(A:(.)), the distance between p,- and any point 
on the link connecting PR(^m) and SU(A:/ \) also increases H. This guarantees that 
there is no collision involved with joint ;'. In addition, a fixed set of struts S{tyr-_^^^ 
is defined (Fig. |4(a)] ): 



S{t)ked = {{/,PR(PR(0)},{',su(su(0)}} 



(6) 
(7) 
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The struts in S{t)K^^^^ unfold two active modules on either side of ;. For example, 
{6,4} in Fig. |4(a)| is in charge of unfolding module 5. A set B{t)' of bars is: 

5(r)' = {{/,PR(0},{/,su(0}} (8) 

Note that if B(f )' and S{ty have common elements, they will be removed from 5(f)'. 
Now we can relax (1-4) using the fact that S{t) = Ui'^(^)' C S original and Bit) = 
Ui B{tY = Buriginai- Recall that Soriginai IS the set of all struts between any two active 
modules where one is not a direct predecessor or successor of the other and Bu^iginai 
is the set of all bars between any two successive active modules: 

minimize 2^ j I v,-|p+ Y, 7 ri ^ — n n (9) 

subject to (Vj - V,) • (P; - p,) > IIP; - P;|| , for {;, j} e S{t) C Soriginai (10) 
(V; - V,) • (Pj - Pi) = , for {i, j}eB{t)= Boriginal ( 1 1 ) 

Vfl, = v„, = 0, fli ,(32 are any two successive active modules. (12) 

Since the convex program for Soriginai is always feasible, this relaxed convex pro- 
gram also has a well-defined solution v at any time, v unfolds every joint in a greedy 
manner (due to S{t) fi-^^d) and avoids collisions (due to S{t)temp) while maintaining 
the rigid constraints mB{t). 

Local Motion for a Cube Chain 

For a cube chain, rsR cannot be arbitrarily small because two mid-link points can 
collide in contrast to the line segment case. In other words, the size of a cube body 
determines minimum allowable sensing range. The value can be obtained using sim- 
ple geometry in terms of I, the side length (Fig. |4(b)| . If d{pi,jt\) refers to the 
minimal distance between sensor p, and module ji.\, rsR should satisfy: 

rsR > e (13) 

Recalling that a cube can be also treated as a line segment (base), we may want 
to make a sensor to detect points on the bases. Then d{pi,j(^:-j) now refers to the 
minimum distance between a joint and a base; a greater sensing range is needed. 



rsR> {I + V2/2)xe (14) 

As in the case of the line segment chain, only active modules will use their proximity 
sensors. Since rsR is larger than a module, any collision will happen only after a 
sensor detects danger. It is, however, not sufficient to construct 5(f )Jg,„„ from module 
/ which has found some neighborhood since the neighborhood can collide with other 
points on links adjacent to /. So, additional struts are needed to prevent these mid- 
link collisions. To avoid collision between two links of slender symmetric adornment 
like the cube chain, we should try to expand all distances between the end points of 
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the links provided they are not connected by a bar IS). This idea can be readily 
implemented by defining S{t)'fi.^„/. 

S{t)Lns = {{',7}|7e {PR(^(.)),SU(^(.))|VA:(.) e A?PR(,.)U^su(/)}} (15) 

For example, sensor 5 in Fig. |4(b)| will also construct struts {4, 99}, {4, 100} G 
Sit)l,„, and {7,99}, {7, 100} e 5(0/„„„ in addition to {5,99}, {5, 100} G S(f)Lp. 
These additional struts guarantee no collision between link P4P5 (or psp?) and 
P99P100- The definitions for other sets are the same except for: 

Sity = 5(f )Lp U 5(0>,w U 5(r);,„,„ (16) 

We can also obtain the relaxed problem (9 - 12) to get v at any time. 

Hybrid System Model 

We have shown that there exists an instantaneously safe unfolding motion v in spite 
of the relaxation. We shall call it local-scale strictly expansive motion in contrast to 
the global-scale strictly expansive motion. Since S{t) changes over time, the vector 
field on the right hand side of p = /(p) changes over time. Thus, we have a set of 
all possible vector field 5^ = {/i (p) , /2 (p) , • • • } , forming a hybrid system, in contrast 
to the single vector field /(p) of the original formulation. ^ corresponds to a set of 
all possible combinations of 5'(f) and B{t) and the domain of /;(p), C/,, can be repre- 
sented as an open subset in the configuration space P which satisfies J (pa , ^r.) ) > r^R 
for some a,bi\ (they are not in a local neighborhood) and d{^siU-)) < fSR + ^ for 
some s,t(^.^ (they are in a local neighborhood). The following theorem will be useful 
to construct the global solution. 

Theorem 1. Each vector field fi{p) in ^ — {/i (p),./2(p), • • • } is dijferentiable with 
respect to p in [/,■ f] Q where Ut is /i(p) 's domain and Q is a collection of configura- 
tions around p in which any additional joint angle is not n. 

Proof. We only have to check that /;(p) satisfies five conditions from Lemma 7 
in im which established the smooth dependence of the solution on the problem- 
definition data A(p) and i>(p) in parametric optimization problems of the type: 

min{g(p,v) : v G r2(p) C M'", A(p)v = Z7(p)} (17) 

Our relaxed problem can be regarded as this type where g(p, v) refers to the objec- 
tive function (9) and f2(p) the feasible set from (10). A(p) and fo(p) can be con- 
structed from (11) and (12). 

1. Is the objective function twice continuously differentiable and strictly convex as 
a function ofy G ^(p), with a positive definite Hessian, Vp G [//flQ •'' 

The objective function is the sum of quadratic functions and additional smooth 
convex terms. Thus it is twice continuously differentiable and strictly convex. 

2. Is 12 (p) an open set, Vp G t/,n G ? 

X2(p) is open since the inequalities (10) are strict. 
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3. Are the rows of the constraint matrix A{Tp) linearly independent, Vp G f/iflQ ■'' 

Because the equality constraints are the same as the original formulation, this 
condition still holds |4||. 

4. Are A(p), /^(p), and Vg (with respect to y) continuously differentiable in p € 

A(p), b{p) are linear. Vg is also continuously differentiable from the fact in 1. 

5. Does the optimum point y* (p) exist for every p G f/; fl 2 •'' 

The relaxed problem is also convex. We can always find a unique solution. 

In conclusion, /,(p) satisfies the five conditions establishing the smooth dependence 
of/;(p)onpinC/,ne. □ 

5.2.3 Global Motion with Hysteretic Behavior 

We now show how to construct an integral curve (unfolding path) on the configura- 
tion space governed by the hybrid system and hysteretic behavior in sensing. 

Theorem 2. Consider the hybrid dynamic system p — y — f.) (p) where fi\ (p) € ^ 
and assume the hysteretic behavior from Definition 2. There exists a unique integral 
curve which represents the unfolding path from given initial configuration p^. 

Proof. Given a configuration pc, we can designate a unique vector field until one 
of the neighborhood memberships expires or a new membership is issued. In fact, 
Pt is located strictly inside the domain of the designated vector field since we have 
a nonzero margin before a new membership is issued or an existing membership 
expires due to the hysteresis. 

Recalling from Theorem [T] that each /(.)(p) is differentiable, we can define a 
unique maximal integral curve in the domain which cannot be extended beyond a 
certain positive limit on time T <o° starting from p^ . But the integral curve should 
reach the boundary of the current vector field (type II cusp) in finite time because of 
the finite growth rate (at least unit rate) of the strut constraints in S{t)temp- 

As soon as it reaches the boundary, a switching of vector fields occurs and a 
unique integral curve will be constructed again exploiting the fact that the switching 
point is also located strictly inside new vector field's domain due to the hysteresis. 
This new integral curve will be connected to the existing integral curve, but these 
processes will last only finitely until we get to a type I cusp since the struts in 
S{t) fixed are also growing at least with unit rate at any instant. Type I cusps can also 
appear only finitely. Therefore we can finish the unfolding in finite time. D 

Hysteresis plays an important role in the above theorem. It guarantees that the points 
where vector fields switch are actually located strictly inside new vector field's do- 
main. Thus it allows local motion to be always computed by a well-defined vector 
field exclusively from ^. What would have happened without the hysteresis? 

Each vector field in ^ has repulsive nature in that any integral curve starting from 
strict inside of its domain tends to escape the domain by expanding struts which will 
change current neighborhood relationship. If there is no hysteretic behavior, in other 
words, if 5 — > 0, this repulsive nature of vector field is very likely to result in sliding 
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mode ll6l [T4ll on domain boundaries. To describe sliding mode, we need a third vec- 
tor field which is different from the two vector fields on either side of the boundary. 
This new vector field, however, may not be desired since it is not an element of ^ in 
which every element was proved to generate safe local motions. To be more specific, 
the new vector field may not guarantee local expansiveness for collision avoidance 
or more than unit rate of strut expansion for convergence and completeness. Thus 
the hysteresis is required for the solution to avoid any undesirable sliding mode. 

5.2.4 Decentralized Algorithm 

Recall that A = {ai,a2,--- a„, } is a set of active modules. Without loss of generality, 

we can rewrite it as A = { 1 , 2, ■ ■ • ,m} since we have been working only with active 

modules. We then have m subsystems where each module k is coupled only with 

modules in 4, a set of all modules which appear in 5'(f )* U B{t)'' U D{t)^, where 

D{t)^ is a set of struts which ends at k, for example, {6, 1} G D{ty in Fig. |4(a)| In 

other words, considering £>(?)*' means that we will take the modules which detect k 

into account. Also note that kEl/^. No matter how many active modules there are, we 

only have to maintain a limited number of local contacts which can be found using 

PR(-) and su(-) pointers. The pointers do not require global information so each 

module doesn't need to have a specific ID. The number of required local contacts 

for each module has an constant upper bound particularly for a cube chain since 

the body shape and the sensing area are compact on the plane. Compared to the 

centralized algorithm in which every active module is coupled with all others, this 

fact allows us to formulate a decentralized algorithm by decomposing Eqns. (9 - 

12). Assume that each module is equipped with a perfect localizer as before. First 

we shall designate a reference module to which every localizer should refer or pin 

down a module to address (12). 

Now we need some concepts and notations from fT|. A hypergraph can be used 

to represent the decomposition structure where the nodes are active modules and 

the nets (or hyperedges) are constraints among them. For example, the fact that ve 

should be shared by six others (modules 1, 2, 4, 5, 7, and 8) in Fig. |4(a)| corresponds 

to a net (hyperedge) in the decomposition structure. Let u^ E R^* be a collection of 

velocity of modules which belong to 4. If 4 = {fn, n, k, p, q}, then an example would 

be, u, = (v^ v„^ v[ vj v,y)^eR^*.Letu=(u[ u^ ■■■ u^,)^ e K^, 

X =Xi-\ \-X„,. We will use the notation (m), to denote the ;* scalar component 

of u. The basic strategy is to independently solve for each module k which has a cost 

function hif{uk), and constraints u^ G £i-, which is a subset of constraints in (10) 

and (11) featuring vj^, and impose the nets to establish consistency among shared 

variables. In terms of u, various components of u should be the same as defined in 

a net. This idea can be efficiently represented by introducing a vector z e M*^, 7 is 

the number of nets, which gives the common values to each net by calculating Ez 

where EeR^""^: 

Jl (m)/ is in net ;■ 

'■' " 1 otherwise ^^' 
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Table 2 Decentralized unfolding algorithm for an active module k 

function Decentralized-Unfolding returns instantaneous unfolding motion ut 
input : p/j , position of modules in 4 

while A^i^> e (small positive constant) do 
Vik ^ SOLVELOCALPROGRAM(P4,/iA) 
Hk ^ UPDATEPRICES(/it) 

end while 

Lastly, XetEji £ R^*^'' denote the partitioning of the rows of £ into blocks associated 
with module k such that u^ = Zs^z. 

Our problem is then to solve a master problem: 

m 

^ (19) 

(20) 
(21) 



(22) 



Note that we have the same set of constraints as the previous formulation since the 
union of Q's is equal to (10) and (11) and we already have addressed (12). The 
above formulation is, however, different from the previous formulation since the 
cost function X"=i hk{\ik) is different from (9). Still, there is no problem to apply 
the results in Sec. l5.2.2l and l5.2.3l since S™=i hk{\ik) is just another weighted sum of 
terms in (9) which do not affect convexity and differentiability. 

We will apply dual decomposition featuring a projected subgradient method HI 
to solve this master problem. Then each active module only solves the following 
local problem: 

minimize hii{viji) + /iju^ subject to Ujt G €k (23) 

To be more specific, this can be elaborated as: 

minimize ^11 v,-||^+ Yj 1 \~i ^ — n T{ + I^I^k (24) 

subject to (v,- - V,-) • (p; - p,-) > Up,- - p,-|| , for {/, ;} e S{tf U D{tf (25) 
(y,- - V,) • ivj ~ p,) = , for {ij} e B{tf (26) 

where jXk is the Lagrange multiplier for u^. 

Each active module runs the following decentralized algorithm independently. 







minimize ^ hk{\Xk) 

k—\ 








subject to U;t e dV, ^ = 1 , • • • , m 








Uk^EkZ, k^l,---,m 




will let 








hk{nk) -~ 


= Ellv,- 


Ih I ^ 




{,-.;}GS(f)*U£.(f)* (^' " ''J^ ■ ^P' " P>) " 


-||Pi-p<il 
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Table 3 Computation time ratio. Implementation was done in Matlab and used cvx; the 
results have been normalized by the time needed to compute the Humanoid reconfiguration 
using the centralized algorithm. Note that snapshots are rescaled for visibility. 

Shapes Reconfiguration Relaxed solution computation time / 

Centralized solution computation time 



Gripper (16 modules) ™ ^ ^X/*^ T> 0.03/0.05 




Hammer (32 modules) " M ^ \ 0.18/0.55 



Humanoid (48 modules) ^ \ 0.19/1 



SolveLocalProgramO solves (24 - 26). To update the Lagrange multiplier 
jj.); in UpdatePricesO, a module k has to communicate only locally with others 
in 4. Refer to Q] for details about updating prices. Recall again that each module 
finds linear velocity as a result of this computation. Thus each module may need 
to be equipped with a synchronized clock to stop above decentralized work, trans- 
form linear velocity into angular velocity, and actuate its joint simultaneously. As 
mentioned earlier, we need to reverse unfolding motions to get folding motions. 

We have compared the centralized algorithm (1 - 4) to the relaxed problem 
(9 - 12); the results are shown in Table 3. The results have been scaled to show 
the improvement in computation time between the formulations and allude to the 
promising results expected from the end-to-end decentralization algorithm of Ta- 
ble 2. Note that the relaxed problem takes much less time even though there is 
an additional step to find neighborhood, holding other conditions constant. Thus 
the fully decentralized implementation, deferred for future work, is expected to be 
much faster with less communication. The results have been scaled by the time 
needed to compute the Humanoid reconfiguration using the centralized algorithm. 
Note that this example was not optimized for runtime efficiency, but rather serves to 
show how the algorithm scales with the number of modules as well as the difference 
between centralized and relaxed versions. 



6 Conclusion 

This work presented practical algorithms for collision-free reconfiguration planning 
for chain-type modular robots. Both the centralized and decentralized versions were 
developed using the Carpenter's Rule Theorem and maintained connections between 
modules. The decentralized algorithm will be particularly beneficial as the number 
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of modules increases since each module (without a unique ID) only has to maintain 
a limited number of contacts. 

We are interested in testing our algorithms by implementing the sensor model 
in a current modular robot system. The pointers, PR(-) and su(), will be easily 
implemented using a neighbor-to-neighbor communication scheme. As mentioned 
before, the capability of our algorithms to deal with close-packed configuration can 
be integrated with other popular motion planning methods to handle obstacles and 
improve overall performance. 

Acknowledgements. This work is funded in part by DARPA Grant W9 11 NF-08- 1-0228 
(Programmable Matter). 
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Robomotion: Scalable, Physically Stable 
Locomotion for Self-reconfigurable Robots 



Sam Slee and John Reif 



Abstract. Self-reconfigurable robots have an intriguingly flexible design, com- 
posing a single robot with many small modules that can autonomously move to 
transform the robot's shape and structure. Scaling to a large number of modules is 
necessary to achieve great flexibility, so each module may only have limited pro- 
cessing and memory resources. This paper introduces a novel distributed locomo- 
tion algorithm for lattice-style self-reconfigurable robots which uses constant mem- 
ory per module with constant computation and communication for each attempted 
module movement. Our algorithm also guarantees physical stability in the presence 
of gravity. By utilizing some robot modules to create a static support structure, other 
modules are able to move freely through the interior of this structure with minimal 
path planning and without fear of causing instabilities or losing connectivity. This 
approach also permits the robot's locomotion speed to remain nearly constant even 
as the number of modules in the robot grows very large. Additionally, we have devel- 
oped methods to overcome dropped messages between modules or delays in module 
computation or movement. Empirical results from our simulation are also presented 
to demonstrate the scalability and locomotion speed advantages of this approach. 



1 Introduction 

Throughout nature, a recurring concept is that of a collection of simple structures 
combining to form something much more complex and versatile. Self-reconfigurable 
(SR) robots seek to implement this concept using a collection of robotic ''modules" 
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Fig. 1 Left: Three modules at the back of a Robomotion tunnel coordinate to move into that 
tunnel. Right: A Robomotion tunnel snakes around two brown obstacles. Yellow modules 
are moving. 



which can autonomously move to reconfigure the overall shape and structure of the 
robot. A key problem faced by these robots is how to generate locomotion. Many 
prior algorithms have done this, but very few have been able to scale to robots with 
very large numbers of modules while only using robotic structures - configura- 
tions of modules - which are stable in the presence of gravity. We introduce such 
a method here with our Robomotion algorithm, which focuses on 3 main goals: 
scalability, speed, and stability. 

For Goal 1, scalability, we mean that the algorithm could reasonably be expected 
to execute on a SR robot, implemented in hardware, composed of thousands or 
millions of modules. This means that the memory per module must be sub-linear in 
the number of modules and decisions made by each module can only be based on 
local information (i.e. from nearby neighbor modules). As we describe in Section[3] 
our algorithm achieves per module constant bounds on memory, processing, and 
communication. This has allowed us to simulate robots with over 2 million modules. 

For Goal 2, speed, we are actually interested in 
the locomotion speed of the robot. There are two 
parts to creating fast locomotion: the physical move- 
ment approach and the means of controlling it. For 
large collections of modules, a small group of them 
will not be strong enough to move the rest of the 
robot. Prior work avoided this limitation by hav- 
ing some modules remain stationary while other 
modules flowed from the robot's back to its front. 
We have also adapted this approach for our Robo- 
motion algorithm. For equally scalable control, a 
very distributed algorithm is required. Our algorithm 

achieves this by using straight tunnels through the robot's interior through which 
modules can travel. These tunnels act much like highways: fewer movement op- 
tions allow for increased speed and efficiency. 

For Goal 3, stability, we want to guarantee that our robot will not collapse under 
its own weight. This is a difficult global property to maintain because SR robots 
are able to form so many different shapes, many of which are unstable in the pres- 
ence of gravity. Figure |2] shows how a slight change to 1 module's position can 







Fig. 2 A: A subtle change mak- 
ing a big stability difference. 
B: A convex-comer transition 
from side to top by Module S. 
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dramatically affect stability. In Config 1, the inter- module connections A i-Bi and 
A2-B2 experience very small tension and shear forces due to the weights of mod- 
ules Bi and B2, respectively. However, by moving 1 module, in Config 2 those same 
inter-module connections must support all modules stacked above Ai, a far greater 
amount of shear and tension force bounded only by the height of the stack. Com- 
pression forces in other parts of the robot are about the same for either situation 
but are less important. Similar to large manmade structures, most SR robot hard- 
ware implementations handle compression forces much better than shear or tension 
forces on inter-module connections. 

Although difficult to maintain, stability is a critical property. A real robot which 
loses efficiency will just move slowly. A real robot which loses stability will break 
hardware units. This cannot be a property that is probabilistically met. It must be 
guaranteed. For our Robomotion algorithm presented here, we guarantee stability 
while assuming only a very limited set of physical abilities for modules. 

In addition to our 3 main goals, we make our algorithm more realistic by han- 
dling potential hardware errors: message drops and module delays in computation or 
movement. We also disallow convex-corner transitions by individual modules (move 
around a convex corner formed by 2 orthogonal surfaces as in Figure |2l Part B) as 
many hardware implementations have had difficulty executing these movements. 

To accomplish our goals, our algorithm has a distributed approach through co- 
ordination within small groups of modules. These groups are dynamically formed 
whenever they are needed, and disassembled when no longer needed. In a sense, 
modules act Uke biological stem cells, able to join any group and take on whatever 
role is required. Dynamically forming a group requires consensus that each module 
has agreed to join the group and precisely which role each will take. Proving that 
a consensus will always be reached becomes difficult when the needs of the robot 
could change at any time (i.e. the robot could be instructed to reverse its locomotion 
direction or to turn), when different modules hear about those changes at different 
times (since we rely on module-to-module communication), and when messages 
sent between modules are not guaranteed to always be received. 

The remainder of our paper is organized as follows. In Section |2] we survey the 
relevant literature. In Section [3] we give a general overview for how Robomotion 
addresses our goals described above. Section|4]describes our simulation of Robomo- 
tion and gives results from several experimental trials we used to compare our algo- 
rithm to the leading prior work on this topic. In Section|5]we present our conclusions 
and describe future work. Detailed pseudo-code and proofs are omitted here but are 
included at http://www.es. duke. edu/^^reif/paper/slee/robomotion/robomotion.pdf. 
The primary contributions of this paper are Robomotion's algorithmic design and 
the experimental results from its simulation. 



2 Related Work 

In order to achieve highly scalable locomotion - for robots with thousands or mil- 
lions of modules - we must have distributed control algorithms that use sub-linear 
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Fig. 3 Left: A Waterflow style algorithm by Butler et. al. |i4| Middle: The Million Module 
March algorithm. [SI) Right: Robomotion in simulation. 



memory and processing per module. The best previous example is the Million Mod- 
ule March (MMM) algorithm by Fitch and Butler fSj. The robot's goal is to reach a 
specified goal region and each module executes the same, limited plan: repeatedly 
calculate its best route to a desired goal region based on the routes chosen by neigh- 
boring modules. In this way modules closer to the goal region pass information back 
to modules further away. This approach is flexible and was shown to work on a robot 
with just over 2 million modules. However, it did not consider physical stability and 
so would create configurations which would be unstable in the presence of gravity. 
Also, the constant replanning of routes in MMM can lead to inefficient locomotion. 

Aside from MMM, the other main approach to highly scalable locomotion for 
lattice-style SR robots is not a specific algorithm but a general technique known as 
waterflow or cluster flow. In this algorithm, modules in the middle of the robot stay 
in place while modules at the back move up, slide across the top, and then come 
to the front of the robot to reattach there. Continual repetitions of this generate lo- 
comotion that looks like water flowing along the ground. Algorithms implementing 
this technique have been made by several research groups |[TJI11I21- 

As other researchers have noteco the key to generating fast locomotion with these 
techniques is to have a high ratio of moving modules to total modules in the robot. 
In this paper we refer to this ratio as the Simultaneous Active Movement (SAM) rate 
which we describe further in Section[3] Both the waterflow technique and MMM re- 
strict module movement to the exterior surface of the robot. For dense robot shapes, 
which are more likely to be stable, we would expect the SAM rate to drop as the 
number of modules grows and more modules become trapped in the robot's inte- 
rior. We empirically verify this with our simulation results given in Section |4l and 
also show that Robomotion instead maintains a high SAM rate by allowing interior 
module movement. Of course, the SAM rate of an algorithm will drop even faster if 
we use a slow control algorithm. One highly scalable implementation of waterflow 
by Butler et. al. used stateless local rules to reduce computation |4j. However, later 
analysis found this approach to be unwieldy as standard program-based control had 
tens of rules while stateless local controllers had hundreds of rules. ^ 

General reconfiguration algorithms could also be used for locomotion by contin- 
ually requesting new configurations in the desired direction of movement. However, 
most are not highly scalable as they require linear memory for at least one module. 



' From page 5 of 1 1 1: "the speed is proportional to the ratio of moving meta-modules relative 
to the total number of meta-modules." 
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Some are also too slow due to requiring linear time for planning or, worst of all, 
using a centralized planner. One reconfiguration algorithm that did use sub-linear 
memory is the scale-independent algorithm of Nagpal and Stoy lfT4l . This specified 
the goal region as overlapping rectangular boxes and then placed modules into the 
goal region relative to the scale of the boxes. This algorithm forms static scaffolding 
in the goal region first so other "wandering" modules can flow freely through the 
gaps that are formed. However, Nagpal and Stoy found that their scheme had worse 
performance compared to traditional reconfiguration algorithms in terms of number 
of moves, time steps, and messages. 

A similar reconfiguration algorithm by Stoy was not scale independent but still 
used that scaffolding technique 113|. Here static modules in the scaffolding would 
send out signals to attract wandering modules to new locations. This algorithm could 
be adapted to locomotion but would have some movement inefficiencies since mul- 
tiple wandering modules can be attracted to a single open location. Also, no guaran- 
tees about physical stability are made and the algorithm's guarantee for connectivity 
(a property described in Section [T2t only holds if modules in the scaffolding never 
move once they join that scaffolding. For locomotion, portions of the scaffolding 
would continually need to be removed and moved elsewhere. Doing this safely be- 
comes difficult when the robot could change its movement direction at any time 
and module actions are asynchronous. Another approach given by Ravichandran, 
Gordon, and Goldstein IfTOll used only O(logn) memory per module while finding 
a bijection between initial positions of modules and the desired target positions. 
However, that work did not focus on the motion plan to reconfigure those modules. 

Some reconfiguration algorithms allow "tunneling" module movement through 
the interior of the robot's structure ifTSl [3, a key property of our own Robomo- 
tion algorithm. However, since there is no central coordination between the planned 
paths for modules, the SAM rate for these prior tunneling algorithms is likely to 
be similar to that of MMM, which also uses a decentralized, greedy approach. The 
SAM rate for the MMM algorithm is considered in Section |4] of this paper. Also, 
these tunneling algorithms do not guarantee physical stability. 

The Robomotion algorithm described in this paper is for lattice-style SR robots, 
but fast locomotion has been demonstrated in hardware for small chain-style SR 
robots. In this style the robots form kinematic chains or loops. Work by Yim et. al. 
may have the fastest locomotion demonstration ifTTIl . but other work by Yim ifTTll . 
Shen et. al. ITSl . and Murata et. al. ||9l [161 show various locomotion gaits with 
walkers, snake-like sidewinders, or rolling loops. However, if gravity is considered, 
these techniques would not scale to moving very large robots (with thousands or 
millions of modules) and are not directly applicable to lattice-style robots. 

Physical stability occurs in hardware demonstrations for SR robots, but the topic 
of stability has not been heavily studied for highly scalable theoretical algorithms. 
Prior work by Shen et. al. has shown how to balance a chain-type SR robot by 
calculating the center of mass of the robot in a distributed manner lIS). However, this 
did not consider physical stability in the sense of guaranteeing limits on shear and 
tension forces experienced between adjacent modules. These stability guarantees 
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are a key contribution of the Robomotion algorithm presented in this paper, along 
with its high scalability and movement efficiency benefits. 

3 Overview of Intuition for Robomotion 

To make our theoretical algorithm applicable to as many lattice-style robot hardware 
types as possible, Robomotion assumes the commonly-used Sliding Cube abstract 
model [6|. In this model each module is represented as a cube and modules are able 
to slide along flat surfaces formed by other cubes and make convex and concave 
transitions between those surfaces. Communication occurs with messages passed 
between adjacent modules. It has been shown that algorithms made for this model 
can be executed by a range of hardware implementations [61. To further extend the 
applicability of our Robomotion algorithm, we disallow convex transitions as this 
action is hard for many hardware implementations. We now describe our Robomo- 
tion algorithm and the intuition behind it. Again, our main goals for this work were: 
fast locomotion speed, physical stability, and high scalability. 

3.1 Fast Locomotion 

To help evaluate the speed of our locomotion algorithm, we define the Simultane- 
ous Active Movement (SAM) rate as the ratio of the number of moving modules 
compared to the total number of modules in the robot. Prior work has noted how 
locomotion speed for lattice-style SR robots correlates closely with this metric |[T1 
and that this rate tends to correlate with the exterior surface area of the robo{3 EEl. 
Asymptotically, an ideal SAM rate would be 1:1. This occurs when, on average, a 
constant fraction of modules in the robot are able to simultaneously move. To get a 
high SAM rate we'll need to have the surface area of the robot - which is the area 
along which modules can travel - to be asymptotically equal to the volume of the 
robot. A robot only two modules high could have the top surface of modules move 
for a 1 : 1 SAM rate, but in general as the number of modules grows large keeping one 
dimension asymptotically shorter than another is difficult to achieve or maintain. 

Instead, our Robomotion algorithm uses interior movement to reach a 1 : 1 SAM 
rate. Modules begin at the back of the robot and move to new positions at the front, 
just like in the Waterflow approach described in our Related Work Section. However, 
now modules move through interior tunnels instead of across the exterior surface of 
the robot. By using a constant fraction of the robot's volume to move modules, we 
can achieve a 1 : 1 SAM rate. The tradeoff is that interior movement makes physical 
stability a harder property to maintain. 



^ From page 9 of (5): "We note that for simple cubic shapes, the surface area of the robot 
increases with n ' , so in fact we should expect parallelism relative to n to decrease as the 
robot gets bigger." 
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3.2 Stable Locomotion 

Most SR robot algorithms do not consider stability, but most do maintain another 
global property: connectivity. For this property, each module in the robot has a path 
(series of physical connections) to any other module in the robot. For stability, it 
helps to consider the common prior approach to connectivity: safety searches. 

Prior Solution: Safety Searches. With this method, before any module m moves 
it first finds alternate paths to connect its neighbors. Module m then locks into place 
any modules on these paths before m itself can move. This technique was used by 
the highly scalable MMM algorithm JSl. However, there are three main drawbacks 
to this approach. (1) If we allow unbounded searches, a single search could be very 
slow. In the worst case we would have a single loop of all n modules in a robot and 
would have to go through all of them to make an alternate path, finally allowing 1 
module to move. (2) If we bound searches, then we might miss the opportunity to 
move modules that could have safely moved, thereby lowering the robot's SAM rate. 
Finally, (3) without any global coordination, the movement of modules can become 
very random and the structure formed for the robot becomes hard to predict. 

For modules moving through the robots interior, these paths are really tunnels. 
Without global coordination, random module movements can quickly create com- 
plex internal mazes of tunnels and verifying stability becomes much more complex 
than the simple "find any path" searches needed for connectivity. Just as we would 
not trust the stability of gold mining tunnels that were randomly dug above and be- 
low each other, for an SR robot we need to be more coordinated in how we form 
these interior movement tunnels. 

Our New Solution: Support Columns. Our Robomotion algorithm avoids these 
tricky situations by using static support columns placed at repeated intervals through 
the robotic structure. Modules in these support columns do not move and so can 
provide support. Modules between these columns attach to them for support and are 
free to move without fear of causing instabilities. To form these support columns, 
we use repeated groupings of 3 modules which we refer to as L-groups. An exam- 
ple L-group is pictured at the far left of Figure |4] while the other portions of that 
figure show how L-groups can be stacked into a column or into a set of adjacent 
columns to form a full robot structure. The gaps in pictures (b) and (c) of Figure |4] 
- which account for 1/4 of the available volume in those structures - show where 
other modules could travel freely through the structure. If we can keep new modules 
constantly moving through these gaps then we'll have our desired 1:1 SAM rate. 

Define a slice of modules as a set of modules all having the same coordinate posi- 
tion for a given axis direction (assuming those directional axes are aligned with the 
rows and columns of our robot's lattice structure). Robomotion generates locomo- 
tion by repeatedly disassembling the columns of modules in the backmost slice of 
the robot, sending those modules through the interior tunnels, and finally assembling 
a new slice of modules (forming new support columns) at the front of the robot. We 
can actually disassemble or assemble an entire slice in parallel and still maintain 
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Fig. 4 From left: (a) a single L-group; (b) 1 line of columns; (c) multiple adjacent column 
lines to show what a full robot looks like; (d) our simulation implementation with L-groups 
in purple and moving modules in green. 



stability. Keeping columns close together bounds the number of modules hanging 
from the side of any given column module. Assembling at most 1 slice at the front, 
and 1 slice at the back, of the robot at a given time limits the number of modules 
hanging from the front and back of any column module. If each module has roughly 
the same weight, this also means that we've bounded the amount of shear or tension 
force experienced on the connection between any pair of modules. 



3.3 Scalable Locomotion 



This support column structure also permits a highly scalable locomotion algorithm. 
The only safety checks we need come while assembling or disassembling a column 
(as part of a slice). These checks are: (1) consensus between modules in an L-group 
to decide to disassemble or to decide if assembly of that L-group is complete, and 
(2) messages between L-groups (in the back slice) from the top down to state when a 
column is ready to be disassembled, and messages from the bottom up (front slice) 
to state when a new column has finished being assembled, and (3) 1 -directional 
horizontal messages (perpendicular to movement direction) between L-groups to 
state when a slice is ready to be disassembled or has finished being assembled. 

Only constant-bounded memory and communication is needed. Disassembly or 
assembly communication really only involves 5 modules: the 3 modules in an L- 
group, a module from an adjacent L-group above (for disassembly) or below (for 
assembly) stating that the column is ready, and a module from a side-adjacent L- 
group stating that a slice is ready. Thus, a module only needs to know state about 
its own L-group (if it's in one) and to receive messages from neighbors. Thus, each 
module needs only a constant-bounded amount of memory if we put a limit on the 
number of messages "yet to be read" for any module. Overflow messages could 
be dropped as that challenge is also handled by Robomotion (as described later in 
Subsection l3.6l ). Modules moving through the robots interior can just follow direc- 
tions given by adjacent L-group modules. Therefore moving modules only need to 
remember their state (i.e. that they are moving) and the most recent move direction 
order received. 

Only constant-bounded processing is needed as well. Inside of the robot, only 1 
module per L-group does anything: it sends direction messages to moving modules 
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Send MOVE-DIRECT messages to any adjacent FREE modules. 

if IS-BACKEND == TRUE and DISASSEMBLE-NOW == TRUE then 

Send a BECOME-BACKEND message to module in FORWARD direction. 

{switch to being a FREE module after getting the reply message. } 
end 

it given a BECOME-BACKEND rae^Mge then 
I Set IS-BACKEND = TRUE. 

end 

if IS-FRONTEND == TRUE and ASSEMBLE-NOW == TRUE then 

Send a BECOME-L-GROUP message to any module in FORWARD direction. 
(set IS-FRONTEND = FALSE after getting the reply message.) 

end 

it given a REVERSE-DIRECTION me.?iage then 

Swap the values of IS-FRONTEND and IS-BACKEND. 

Swap the values of FORWARD and BACKWARD directions. 
end 



Algorithm 1. The L-GROUP module algorithm, (high level) 



it given a MOVE-DIRECT message then 

I Execute the requested move, if legal. 
end 
it given a BECOME-L-GROUP rae.srage then 

I Switch to being a L-GROUP module in the current location. 

I Set IS-FRONTEND = TRUE. 

end 



Algorithm 2. The FREE module algorithm, (high level) 

which pass by. Moving modules just follow orders and thus require almost no pro- 
cessing. The only remaining work is the safety checks performed when assem- 
bling/disassembling columns of L-groups. This entails (1) directing any adjacent 
moving modules to the right locations, and (2) agreeing with other members of 
the same L-group when its time to disassemble or when assembly is complete. So 
Robomotion needs only a constant-bounded amount of processing per L-group per 
module action (or action request in case a moving module freezes and ignores re- 
peated move direction messages). Thus, we have now outlined how Robomotion 
meets all of our stated main goals: fast locomotion, guaranteed physical stability 
and connectivity, and high scalability (limited memory / processing / communica- 
tion per module). 



3.4 Algorithm Outline: One Tunnel 

Control for Robomotion is mostly done by controlling a single tunnel and then re- 
peating that control structure for each tunnel in the robot. We assume an initial con- 
figuration of n modules, shaped as a contiguous set of solid vertical columns with 
an even number > 2 of modules in each dimension. We assume the robot begins 
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execution with each module knowing its initial role: either being a stationary L- 
GROUP module or being a moving FREE module. Each L-GROUP module begins 
with knowledge of the forward direction (initial direction of desired locomotion) 
and the locations of the 2 other members of its L-group. L-GROUP modules at the 
front of the robot begin with an IS-FRONTEND = TRUE status and L-GROUP mod- 
ules at the back begin with an IS-BACKEND = TRUE status while all other modules 
begin with FALSE for those values. If desired, we could instead broadcast a sin- 
gle message with the desired direction of movement to all modules and then each 
module could quickly calculate its own role and L-group neighbors. 

Modules may send and receive messages with neighbors to which they are di- 
rectly attached. We assume these messages may be lost when communicated but, 
if sent successfully, they arrive instantly. Modules may delay in their movement or 
computation but will never fail. To handle message drops, we use a question-reply 
format. One module A will repeat its "question" message to module B until B sends 
back a "reply" message confirming that it successfully received A's message (or un- 
til A no longer desires to send its initial message). Module B just assumes that, if its 
reply message is lost, there will be another chance to reply when A repeats its initial 
question message. All messages used by Robomotion are designed so no error oc- 
curs if duplicate messages are received. For control of a tunnel as a single unit, we 
assume an external controller (i.e. a human at a laptop) which can broadcast signals 
to the robot to tell it to reverse its movement direction or make an orthogonal turn. 
These commands are then executed by individual modules in a distributed fashion. 

Given the initial movement direction chosen, our goal is to generate locomotion 
in that direction indefinitely, or until a REVERSE-DIRECTION request is received. 
For each request, the robot will eventually succeed in reversing the direction of 
locomotion throughout every tunnel. At all times the robot will be in a configuration 
which is physically stable in the presence of gravity. 

The psuedo-code shown in Algorithms[T]and[2]we give a high-level view of how 
modules interact within a tunnel. In that code, a "legal" move for a FREE mod- 
ule is one where it moves into an unoccupied space and has a solid path to travel 
into that space (i.e. a flat surface made by 2 adjacent L-GROUP modules). The 
DISASSEMBLE-NOW and ASSEMBLE-NOW variables are explained in the next sub- 
section. Later, subsection l3. 61 gives further descriptions of how we execute orthog- 
onal turns or handle situations where only some modules receive a given broadcast 
signal from the external controller. Full details and proofs are omitted here due to 
length requirements. 

3.5 Algorithm Outline: Connecting Adjacent Tunnels 

With a working algorithm to generate locomotion with one tunnel, we now only 
need to keep adjacent tunnels moving at about the same rate. We do this by pass- 
ing messages through a slice whenever an L-group determines that it has finished 
assembling or is safe to be disassembled. Algorithm [3j which would also be run 
by L-GROUP modules, shows how this works for disassembly. Whenever a module 



Robomotion: Scalable, Stable Locomotion 131 



if a DIS-READY message is received from above then 
I Set DIS-READY-UP = TRUE. 

end 

if a DIS-READY message from a side direction tlien 
I Set DIS-READY-SIDE = TRUE. 

end 

if DIS-READY-UP == TRUE and DIS-READY-SIDE == TRUE and no adjacent modules 
in BACKWARD direction then 

Set DISASSEMBLE-NOW = TRUE. 

Send a DIS-READY message to the L-group modules below. 
Send a DIS-READY message to the L-group modules in side directions. 
end 



Algorithm 3. The L-GROUP module collaboration algorithm, (high level) 

switches to an L-GROUP algorithm, it initializes DIS-READY-UP = TRUE if there 
are no other L-groups above it and DIS-READY-SIDE = TRUE if there is no other L- 
group to one side of its L-group. Otherwise these values are initialized to FALSE. A 
process similar to this would be done for assembly and the ASSEMBLE-NOW safety 
check variable. 



tt 




3.6 Other Challenges Solved 

Convex-corner Transitions. Many existing hardware im- 
plementations cannot perform convex-corner turns and 
using several individual modules to form groups of fully 
functional "meta-modules" is expensive and unwieldly. 
However, there are two viable options for getting individ- 
ual modules around a convex-corner. Option 1 is to have 
that module travel along a 3rd surface which is orthogonal 
to the surfaces making that corner. Figure |5] shows an exam- ^'8- ^ Options 1 (top) 
pie of this. Option 2 would be for a second moving module ™'^ 0P^'°" ^ for aid- 

j u ti, c t J 1 . .u t ing convex-comer tran- 

to come and push the first module past that convex-corner, . . 

or pull the first module back from that corner. 

For Robomotion, the only convex-corner transitions needed are when an L-group 
is being disassembled or assembled. We have 2 methods that can allow these corner 
movements, one method for each of our 2 options given previously. Method 1 is to 
have at least 3 modules "beyond" the last L-group in the tunnel. That is, behind the 
back of the tunnel (and back of the robot) or in front of the front of the tunnel/robot. 
In this case two of those modules, along with the next L-group in the tunnel, act as 
a 3rd orthogonal surface for other modules to travel in or out of the tunnel. For our 
second method, we simply have a FREE module wait at the end of the tunnel, mak- 
ing a flat surface at the back of the robot with the last L-group in that tunnel. Now 
any module "beyond" that L-group can slide next to that waiting FREE module, and 
the waiting module could pull it into the tunnel. Conversely, a waiting module at the 
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front of the tunnel could be pushed out beyond the frontmost L-group by another 
FREE module coming behind it in the tunnel. 



Making orthogonal turns Conceptually, making 
a tunnel turn is simple: change one L-GROUP 
module to FREE and when it moves a gap is 
formed to start a new tunnel. In practice we 
need modules in the frontmost 2 L-groups to re- 
set their roles, forming 2 new L-groups facing in 
the desired orthogonal direction. Figure |6] shows 
an example of this. One of those new L-groups 
will also communicate with the last L-group in 
the old tunnel to eventually take over the IS- 
BACKEND status and disassemble that last L-group 
in the old tunnel. Finally, to coordinate multi-tunnel 
turns, only every other L-group is marked as a 
"valid" turn starter. Since tunnels have a 2-module 
width, this prevents the formation of incompati- 
ble parallel tunnels due to an "off by one" er- 
ror. 




Fig. 6 A tunnel beginning 
a turn. Yellow modules are 
FREE and purple modules 
are part of the new orthog- 
onal tunnel. 



Module Delays. Movement through the interior of the 

robot is acyclic (forward or backward in a tunnel), so no 

deadlock can occur. Similarly, dependancies between 

adjacent L-groups (for safety checks) are 1 -directional 

and so deadlock is avoided. Finally, within an L-group, 

1 module (designated by being in the middle) makes all the group decisions. It acts 

on these decisions after getting confirmation from the other 2 L-group members, so 

no race conditions or deadlocks occur within an L-group. Thus, Robomotion can 

withstand delays in module movement or computation without error. 

Delays of external messages. For our model, we anticipate that an external control 
may be used to send "change locomotion direction" messages to the robot as a 
whole. The robot is composed of many individual modules and so there could be 
delays in passing such an external message to all modules in the robot. To handle 
these delays, we place a time stamp on all external messages. This time stamp is 
the only use of memory that would not be constant-bounded. However, even this is 
bounded if modules are trusted to respond to such external messages within some 
large time span, after which we can reset the time stamp counter to zero. 



3. 7 Shape and Terrain Limitations 

Robomotion can achieve stable locomotion for any shape that can be composed from 
vertical towers of modules with small overhangs. This is because setting a constant 
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bound on the shear or tension force experienced by any module also sets a constant 
bound on the length of any overhang coming off any tower (given the weights of 
modules in that overhang). Any physically stable configuration of modules faces 
this limitation. Stronger materials or careful arrangement of modules could increase 
the constant factor length. However, the overhang would be bounded by a constant 
regardless. 

Considering terrain with obstacles, Robomotion can travel 
around obstacles but not as naturally as algorithms like Water- 
flow or MMM. If Robomotion is confronted with a few obsta- 
cles, it can turn to move around them or could "turn" one tunnel 
into an adjacent one, merging them, to flow through small gaps. 
However, for a complex maze or "briar patch" of obstacles, then Fig. 7 Very long 
it may not be worth constructing Robomotion's high-speed tun- overhangs will 
nels. In these cases we may have to accept that there is no fast 'I'^^iy Dreak. 
way to travel through and resort to using the Waterflow method 
(so we can still have physical stability) until we've moved past those obstacles. The 
situation is like building highways in a mountainous region: the faster we want to 
go, the straighter and smother the road needs to be. 



4 Simulation Results 

In addition to our theoretical results, we have also simulated our Robomotion algo- 
rithm to experimentally verify its performance. Our simulation is written in Java and 
can optionally use Java3D to display the robot modules. The basis of this simulation 
was meant to aid us in making comparisons to the leading locomotion algorithm in 
prior work: the Million Module March. While our simulation runs on a serial com- 
puter, it is a distributed rather than centralized implementation in that each module 
executes its own independent algorithm. We enforce that the only communication 
between modules is message passing between adjacent, connected modules. For 
sensing, a module A (which is cube shaped) can detect if it has an adjacent neigh- 
bor on any of its 6 sides or if it can safely move forward into an adjacent lattice 
location. Since our modules can not make convex-corner transitions, this means that 
the space is open and that there is a flat surface for travel (i.e. adjacent neighbor 
modules) between A's current location and the desired new lattice location. 

Modules do not know their global positions. The total robot configuration and 
shape is not known by any module and is not known by any external controller. 
Thus, each module is forced to act based only on local information or on messages 
received from adjacent modules. Also, since there is no global knowledge of module 
positions, any external controller cannot specify exact placements for modules in a 
desired configuration. Instead, we allow less exact specifications like direction and 
distance for locomotion or the desired number of parallel tunnels at the front of the 
robot. This method for controlling the robot is more limiting, but we believe it is 
more realistic for a scalable system in real hardware since it allows us to use only a 
constant amount of memory within each module. 
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Since our simulation runs on a single processor, the modules of course must 
execute in a serialized fashion. However, to make this more realistic, at the start of 
the simulation a randomized order is chosen for the modules. Our simulation then 
executes in iterations. Within each iteration the modules execute in the randomized 
order that was chosen and, when its turn comes, each module executes its current 
algorithm. 

To simulate message passing, a message sent from module A to module B will 
be placed on a queue for B when A executes its algorithm. Module B will then 
read all messages on its queue whenever it has its next turn to execute its own al- 
gorithm. Note that this implementation actually makes Robomotion appear slower 
than it would be when executed on actual hardware. This is because messages sent 
between modules typically take 1 iteration for travel, the same travel time as a phys- 
ical module moving between adjacent lattice locations. This communication delay 
occurs more in Robomotion than in other locomotion algorithms because of our 
question-reply format for handling messages (used to avoid errors due to dropped 
messages). 

4.1 Speed Comparisons 

To make our comparison as direct as possible, we executed simulations for the same 
test that was performed for the Million Module March algorithm. We ran cube- 
shaped collections of n modules a distance of «''^ — 1 module lengths over flat 
ground. This distance was chosen by the MMM authors because «'' ^ is the length of 
1 side of the robot's cubic shape and they wanted a 1-module length overlap between 
the start and goal locations for the robot. Tests were run for different values of n to 
see how the algorithm performed for different robot sizes. The most important speed 
statistic for a locomotion algorithm is probably how long it takes the robot to travel 
a single module-length or "unit" distance. Thus, we took the total number of time 
steps taken by a robot during a locomotion test and divided by the «'' ^ — 1 distance 




Fig. 8 Left: Robomotion stays fast even with huge numbers of modules in the robot. 
Right: Robomotion SAM rate remains high even as the number of modules grows large. 
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traveled for robots of size n. Statistics for Robomotion come from our simulation 
while numbers for MMM are taken from its own publication. ISJ 

For this test we have also simulated a simple implementation of the "Waterflow" 
style of locomotion to use as a baseline comparison along with MMM and Robo- 
motion. As mentioned earlier in our Related Work Section (Section [SJl, Waterflow 
is a style of algorithm where modules at the back of the robot slide up, slide across 
the top of the robot, and then slide down the front to become the new front of the 
robot. In this way, the modules tend to "flow" like water toward the desired goal 
location. Many different research groups have devised different ways to control this 
style of locomotion, yet any implementation should have roughly the same module 
movements and so should give roughly the same locomotion speed (differing only 
by small constant factors). 

In Figure [8] we illustrate the statistical speed comparison between these 3 key 
locomotion algorithms. Here a "time step" is the time for a single module to move a 
single module length (unit distance). Experimentally, we found that the running time 
for the Waterflow algorithm had near-perfect linear growth compared to n^ ", where 
n is the number of modules in the robot. Up through robots of size 40^ MMM had 
similar running time growth, but was less efficient than Waterflow. However, from 
n — 40^ to n = 75^ MMM actually had a slight speed-up. We're not sure of the cause, 
but one educated guess is that traveling a longer distance (49 and 74 unit distances 
for the last 2 data points shown) gave modules in the robot enough time to spread out 
into a flatter robot shape. The dense cube shape of the initial configuration probably 
limited movement. 

Meanwhile, the running time for our Robomotion algorithm is a stark contrast to 
the other two. We maintain near-constant speed for all values of n that we tested. 
Specifically, the running time hovered between 12.33 time steps per unit distance 
traveled for n = 10^ and 13.43 for n = 80^ = 512,000 modules. This consistent 
performance is not surprising since each "tunnel" through the robot is predomi- 
nantly independent of all others. The only slight delays are the message checks 
sent between tunnels to keep any one from getting ahead of (or falling behind) in its 
movement speed compared to adjacent tunnels. Experimentally, there was very little 
effect (less than I time step per unit distance traveled) when we added these checks 
to our algorithm. If implemented on actual hardware, we expect the effect would 
be even smaller since messages should travel much faster than moving modules but 
our simulation used 1 time step for either action. On hardware, we also expect that 
Robomotion and Waterflow would both gain in speed since these algorithms have 
modules travel long straight distances, from the robot's back to its front. These mod- 
ules could increase movement speed or conserve momentum by not stopping during 
this trip since there's no need to re-plan their travel path. 

In addition to the speed of the robot, we also looked at the Simultaneous Active 
Movement (SAM) rate for these 3 locomotion algorithms. Recall that we defined 
this measure in Section[3]as the ratio of the number of moving modules compared to 
the total number of modules in the robot. Figure[8]shows the average percentage of 
modules that were moving at any given time, which is basically just taking the total 
number of actuations made and dividing by the total time steps taken and by the total 



136 S. Slee and J. Reif 

number of modules. The results in this graph are very similar to the previous graph. 
Waterfow and MMM both start with good SAM percentages, but lose efficiency as 
the number of modules in the robot grows large. This time it is Waterflow which 
is slightly less efficient, and beyond n — 40^ there is again a slight improvement 
for MMM. In contrast to these effects, Robomotion once again remains steady, and 
fast, ranging from about 9.8% to 7.5% as the number of modules increase. We note 
that while Robomotion is substantially better on both metrics, its advantage on raw 
locomotion speed seems slightly better than its SAM rate advantage. A likely cause 
is that Robomotion becomes more efficient with longer tunnels, and so for large 
values of n (when the cube shape is longer in each dimension) it was able to be 
nearly as fast even while having a very slight drop in SAM percentage. 



5 Conclusion 

In this paper we have presented a novel locomotion algorithm for lattice-style self- 
reconfigurable robots. This algorithm is highly scalable, produces movement which 
is more efficient than prior locomotion approaches, and always keeps the robot phys- 
ically stable in the presence of gravity. We do this without using convex-corner 
transitions and withstand possible failures in message passing and delays in mod- 
ule execution. Thus, we believe this to be a good step toward developing scalable 
control algorithms which would actually work on real hardware implementations. 

For future work, the way in which we make orthogonal turns is one potential area. 
Our current algorithm focuses on individual tunnels, so turning multiple tunnels can 
be inefficient. Extending our algorithm to very rough terrain is another important 
step, but is difficult for any physically stable algorithm. As we've described, straight 
support columns must be maintained for any configuration to minimize shear and 
tension forces. A likely compromise may be a heuristic-based algorithm that uses 
Robomotion for flat or semi-rough terrain but reverts to Waterflow when faced with 
highly irregular terrain (where high-speed locomotion may be impossible anyway). 
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Adaptive Time Stepping in Real-Time 
Motion Planning 



Kris Hauser 



Abstract. Replanning is a powerful mechanism for controlling robot motion 
under hard constraints and unpredictable disturbances, but it involves an 
inherent tradeoff between the planner's power (e.g., a planning horizon or 
time cutoff) and its responsiveness to disturbances. We present a real-time 
replanning technique that uses adaptive time stepping to learn the amount of 
time needed for a sample-based motion planner to make monotonic progress 
toward the goal. The technique is robust to the typically high variance ex- 
hibited by planning queries, and we prove that it is asymptotically complete 
for a deterministic environment and a static objective. For unpredictable en- 
vironments, we present an adaptive time stepping contingency planning al- 
gorithm that achieves simultaneous safety- seeking and goal-seeking motion. 
These techniques generate responsive and safe motion in simulated scenarios 
across a range of difficulties, including applications to pursuit-evasion and ag- 
gressive collision-free teleoperation of an industrial robot arm in a cluttered 
environment. 



1 Introduction 

Robots must frequently adjust their motion in real-time to respond to un- 
modeled disturbances. A common approach to deal with nonlinear dynamics 
and hard state and control constraints is to reactively replan at each time 
step (Figure [1}. This basic approach has been studied under various nomen- 
clature (model predictive control, receding horizon control, or real-time plan- 
ning) and using various underlying planners (numerical optimization, forward 
search, or sample-based motion planners), and is less susceptible to local min- 
ima than myopic potential field approaches. But replanning, in all its forms, 
faces a fundamental tradeoff based on the choice of time limit: too large, and 
the system loses responsiveness; too short, and the planner may fail to solve 
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Fig. 1 A point robot (green) interleaves execution and replanning to reach an 
unpredictably moving target (red). 

difficult problems in the allotted time, which sacrifices global convergence 
and safety. Empirical tuning by hand is the usual approach. But the time 
needed to solve a planning query can vary by orders of magnitude not only 
between problems, but also between different queries in the same problem, 
and even on the same query (in the case of randomized planners). Unless 
variability is addressed, the safety and completeness of real-time replanning 
is in doubt. 

This paper presents two replanning algorithms that address safety and 
completeness not by reducing variability in planning time, but by tolerating 
and adapting to it. They use a sample-based planner to build partial plans 
whose endpoints monotonically improve an objective function, and adaptively 
learn a suitable time step on-the-fly by observing whether the planner is able 
to make progress within the time limit. The first algorithm, described in Sec- 
tion [31 guarantees safe motion in deterministic, predictable environments by 
construction, and furthermore wc prove that the state of the robot is guaran- 
teed to globally optimize the objective function in expected finite time for a 
large class of systems. We apply it to real-time obstacle avoidance for a simu- 
lated 6D0F industrial robot moving dynamically in a cluttered environment. 
The second algorithm, described in Section^ uses a conservative contingency 
planning approach to achieve a higher probability of safety in unpredictable 
or adversarial environments, and we apply it to a pursuit-evasion problem. 
Experiments suggest that adaptive time stepping is more consistent than 
constant time stepping across problem variations for both algorithms. 



2 Related Work 



Bounded Rationality in Real-Time Agents. Real-time planning architectures 
have a long history of study in artificial intelligence, control theory, and 
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robotics, but few have explicitly addressed the problem of "bounded ratio- 
nality", vifhere limited computational resources hamper an agent's ability 
to produce timely, optimal plans. A notable exception is the CIRCA real- 



time agent architecture [12| that separates the agent's control into high-level 
planning and low-level reactive control tasks. The high-level task conveys 
controller specifications to the low-level task whenever planning is complete. 
The disadvantage of this approach is that uncertainty in computation time 
causes uncertainty in state, leading to harder planning problems. By con- 
trast our approach is constructed to avoid state uncertainty, at least when 
the system is deterministic, which makes planning more tractable. 

Replanning Applications and Implementations. Model predictive control 
(MPC), aka receding horizon control, is a form of replanning that at each 
time step formulates a optimal control problem truncated at some horizon. 
Such techniques have been successful in robot navigation [2, Il6l | ; for example 
the classic dynamic windowing technique introduced for indoor mobile robot 
navigation is essentially MPC by another name [igj. In nonlinear systems, 
truncated optimal control problems are often solved using numerical opti- 
mization or dynamic programming [l|, [llj. In discrete state spaces, efficient 
implementations of replanning algorithms include the D* and Anytime A* 
algorithms which are based on classic heuristic search [lO|, |l7|, |l8| . 

Sample-based motion planners such as randomly-exploring random trees 
(RRTs) and expansive space trees (ESTs) have been applied to real-time re- 
planning for dynamic continuous systems [J, '5, < , 20] . RRT and EST variants 
have been applied to 2D helicopter navigation 5] , free-floating 2D robots [7|] , 
and and car-like vehicles |13j among moving obstacles, as well as exploring an 
unknown environment 3-]. Our algorithms also use sample-based planners. 

Time Stepping in Replanning. Although many authors have proposed frame- 
works that can handle nonuniform time steps [3. 5. 13l.l2Cll|. few actually exploit 
this capability to adapt to the power of the underlying planner. We are aware 
of one paper in the model predictive control literature [14] that advances time 
exactly by the amount of time taken for replanning. The weakness of this ap- 
proach is that if replanning is slow, the actions taken after planning are based 
on outdated state estimates, leading to major instability and constraint viola- 
tions. Our work avoids this problem by setting planner cutoffs and projecting 
state estimates forward in time at the start of planning. 

Safety Mechanisms. Several mechanisms have been proposed to improve the 
safety of replanning in dynamic environments. Feron et al introduced the 
notion of r-safety, which indicates that a trajectory is safe for at least time 
T [5|- Such a certificate establishes a hard deadline for replanning. Hsu et al 
introduced the notion of an "escape trajectory" as a contingency plan that 
is taken in case the planner fails to find a path that makes progress toward 
the goal |7] . We use a contingency planning technique for unpredictable envi- 
ronments that is much like a conservative escape trajectory approach, except 
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that it always ensures the conservative path is fohowed in case of a planning 
failure. 

The notion of inevitable collision states (ICS) was introduced by Petti and 
Fraichard to the problem of real-time planning for a car-like vehicle among 
moving obstacles [13,]. An ICS is a state such that no possible control can 
recover from a collision, and considering ICS as virtual obstacles prevents 
unnecessary exploration of the state space. In practice, testing for ICS can 
only be done approximately, and the conservative test proposed in [l3| may 
prevent the robot from passing through states that are actually safe. Our 
work provides similar safety guarantees without explicit testing for ICS. 

Speeding up Replanning. Many approaches have sought to improve responsive- 
ness by simply reducing average replanning time. Some common techniques 
are to reuse information from previous plans [J], to use precomputed coarse 
global plans to essentially reduce the depth of local minima [3, M, UM, llSf , or 



a combination 3,, 2Q!\ ■ These approaches are mostly orthogonal to the choice 
of time step and can be easily combined with adaptive time stepping. 



3 Replanning in Deterministic Environments 

In real-time replanning the robot interleaves threads of replanning and exe- 
cution, in which the robot (at high rate) executes a partial trajectory that 
is intermittently updated by the replanning thread (at a lower rate) without 
interrupting execution. The planner is given a time cutoff Z\, during which 
it generates a new safe trajectory originating from a state propagated in 
the future by time A. This section presents and analyzes the adaptive time- 
stepping technique and an application to real-time assisted teleoperation of 
a robot manipulator in deterministic environments. 



3.1 Assumptions and Notation 

The state of the robot x lies in a state space S, and its motion must obey 
differential constraints x G U{x,t) (note that this is simply a more compact 
way of writing control constraints). We assume that the robot has a possibly 
imperfect model of the environment and how it evolves over time, and let 
F(t) C S denote the subset of feasible states at time t. We say that a trajec- 
tory y{t) is T-safe if y{t) G U{t) and y{t) G F{t) for ah < i < r. If so, we 
say y{t) is an Fr trajectory. 

In this section we will be concerned primarily with F^ trajectories. We 
will assume that F^o feasibility is achieved by ensuring that each trajectory 
terminates at a feasible stationary state. For certain systems with dynam- 
ics, such as cars and helicopters, a "braking" control can be applied. This 
paper will not consider systems like aircraft that cannot reach zero velocity, 
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although terminal cycles may be considered as a relatively straightforward 
extension. 

We address the problem of reaching a global minimum of a smooth time- 
invariant potential function V{x) via an Fqo trajectory y{t) starting from 
the initial state xq. Assume the global minimum is known and attained at 
V{x) = without loss of generality. We say any trajectory that reaches 
V(x) = is a solution trajectory. We do not consider path cost, and define 
the cost functional C(y) that simply returns the value of V{x) at the terminal 
state of the trajectory y. It is important to note that when we refer to an 
optimal solution, we are referring to the optimality of the terminal point, not 
the trajectory taken to reach it. We also impose the real-time constraint that 
no portion of the current trajectory that is being executed can be modified. 
So, if a replan is instantiated at time t and is allowed to run for time Z\, then 
no portion of the current trajectory before time t + A may be modified. 

We assume that we have access to an underlying planner with the following 
"any-time" characteristics: 

1. The planner iteratively generates Fqo trajectories starting from an initial 
state and time y{to) given as input. 

2. Planning can be terminated at any time, at which point it returns the 
trajectory that attains the least value of the cost functional C{y) found 
so far. 

3. If the planner is given no time limit on any query that admits a solution 
trajectory, then the planner finds a solution in expected finite time. 

A variety of underlying planning techniques (e.g., trajectory optimization, 
forward search, and sample-based motion planning) can be implemented in 
this fashion. All experiments in this paper are conducted with minor variants 
of the sampling-based planners RRT [9] and SBL 15], which grow trees using 
forward integration of randomly-sampled control inputs. The running time of 
such planners is variable across runs on a single query, and can vary by orders 
of magnitude with the presence of narrow passages in the feasible space. 

3.2 Adaptive Time- Stepping with Exponential Backoff 

Here we describe our variable-time step replanning algorithm and a simple 
but effective exponential backoff strategy for learning an appropriate time 
step. Pseudocode is listed in Algorithm 1 in Figure [51 The replanning thread 
takes time steps Z\i, A2, .... In each time step, the planner is initialized from 
the state on the current trajectory at time tk + Ak, and plans until Ak time 
has elapsed (Line 2). If the planner finds a trajectory with lower cost than 
the current trajectory (Line 3) then the new trajectory is spliced into current 
trajectory at the junction t^ -f Ak (Line 4). Otherwise, the current trajectory 
is left unaltered and replanning repeats. 



144 K. Hauser 



Algorithm 1. Replanning with an Adaptive Time-Step 

Initialization: 

Oa. y{t) is set to an Foo initial trajectory starting from i = 0. 

Ob. Z\i is set to a positive constant. 

Repeat for k = 1, . . .: 

1. Measure the current time tk 

2. Initialize a plan starting from y{tk + ^k), and plan for Ak time 

3. If C{y) < C{y) — e for the best trajectory y{t) generated so far, then 

4. Replace the section of the path after tk + Ak with y 

5. Set Ak+i = 2/iAk 

6. Otherwise, 

7. Sot Ak+i = 2Ak 



Fig. 2 Pseudocode for the replanning algorithm. 

A2 



Current time 

^ ^ Ai ^ Replanning.. 

Time step 



... succeeds 



y(t) 






t\ ?2 



-► ...fails -^ ^2 




Fig. 3 Each replanning iteration chooses a time step (left), initiates a plan starting 
from the predicted future state (center), and either succeeds or fails. Upon success, 
the robot progresses on the new plan and the time step is contracted. Upon failure, 
the robot retains the original plan and the time step is increased. 



Note that the condition in Line 3 requires a decrease by some small con- 
stant e > 0. (To allow the planner to reach the global minimum exactly, an 
exception can be made on the final step when C(y) = is attained.) This 
simplifies later analysis by preventing the theoretical occurrence of an infinite 
number of infinitesimal cost improvements. 

Lines 5 and 7 implement a simple exponential backoff strategy for choosing 
the time cutoff. This permits recovery from a local minimum of V{x) in case 
several planning failures are encountered in sequence. Such strategies are 
widely used in protocols for handling network congestion, and there is a rough 
analogy between uncertainty in planning time and uncertainty in message 
delivery over an unreliable network. The idea is simple: if the planner fails, 
double the time step (Line 7). If it succeeds, contract the time step (Line 5). 
The constant 2/3 that we use in the contraction strategy does not need to 
be chosen particularly carefully; resetting Z\fc+i to a small value works well 
too. Figure [3] illustrates one iteration of the protocol. 
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3.3 Completeness and Competitiveness 

We can now state a basic theorem that guarantees that Algorithm 1 is proba- 
bilistically complete for static goals as long as the robot never reaches a state 
where the goal becomes unreachable. 

Theorem 1. If the environment is deterministic and perfectly modeled, and 
the goal is reachable from any state that is reachable from the start, then 
Algorithm 1 will find a solution trajectory in expected finite time. 

Proof. Let TZ be the set of states reachable from the start, and let T{x) 
be the expected planning time for finding a solution trajectory starting at 
x e 7^. Because of the assumption in Section [3Tl T{x) is finite, and so is the 
maximum of T{x) over all TZ, which we denote to be T^ax- First we will show 
that the time until a plan update has a finite expected value. 

Suppose Algorithm 1 has its first plan update on the fc'th iteration after 
k — 1 failed iterations (the possibility that no such k exists is vanishingly 
small). Because fc — 1 iterations have passed without an update, the planning 
cutoff on the fc'th iteration is 2''Ao. So the total time spent T over all the 
fc iterations is a geometric series with sum T = (2*^+^ — l)Z\o- Let Tp be 
the random variable denoting the planning time necessary to find a global 
solution starting at {y{tk),tk). If k were known, then Tp would be lie in 
the range (2''"^Z\o, 2''Z\o], so that the inequality T < ATp holds. But the 
inequality E[Tp] < Tmax holds for aU fc, Xk, and tk, so E\T] < E[ATp\ < 4T„^ax 
unconditionally. 

The number N of plan updates needed to reach a global minimum is finite 
since the initial trajectory has finite cost, and each plan update reduces cost 
by a significant amount. So, the total expected running time of Algorithm 1 
is bounded by ANTmax, which is finite. D 

We remark that the bound ANTmax is extremely loose, and seemingly poor 
compared to the performance bound Tmax of simply planning from the initial 
state until a solution is found. In practice, most problems contain few plan- 
ning queries of extremely high difficulty corresponding to escaping deep local 
minima of C, and the running time of Algorithm 1 will tend to be dominated 
by those queries. Smaller, greedy advances in C{y) are often much quicker 
to plan. 

By construction Algorithm 1 will never drive the robot to an inevitable col- 
lision state (ICS) as defined in [l3|], so it is equivalently "safe". But in which 
systems is it asymptotically complete? The key assumption of Theorem 1 is 
that the goal can be reached by all states in TZ (this can be slightly weakened 
to take TZ as those states actually reached by the robot during execution). 
For example, it holds in reversible systems. In general, however. Algorithm 
1 might inadvertently drive the robot into a dead end from which it can- 
not escape — a condition we might call an inevitable failure state. In fact, 
non-reversible systems seem to prove quite troublesome to all replanning tech- 
niques because detecting dead ends requires sufficient global foresight that 
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Fig. 4 (a) Our 2D benchmark problem. Passage width, and hence, difficulty, is pa- 
rameterized by w. (b) The performance of constant cutoff strategies varies greatly 
across passage widths, and short cutoffs (0.05, 0.1, and 0.2) fail completely on 
difficult problems. The adaptive exponential backoff strategy (E.B.) achieves con- 
sistent performance across problem variations. Performs is measured by solution 
time, normalized by the time of the best constant cutoff for that problem. 

might not be practical to achieve with hmited computation time. These are 
fruitful directions for future work. 

As a final note, we remark that Algorithm 1 is not necessarily complete in 
problems with time-varying potential V. For example, if the global minima 
of V might alternate quickly between two locally easy but globally difficult 
problems, then the algorithm will forever be able to make local progress and 
will thereby keep the time step short. 



3.4 Completeness and Sensitivity Experiments 

We evaluated the performance of the adaptive strategy against constant time 
stepping strategies on a static 2D benchmark across varying problem diffi- 
culties. Consider a unit square state space S where the state is subject to 
velocity constraints ||i;|| < 1. Obstacles partition the feasible space F{t) into 
two "rooms" with opposite-facing doorways, which are connected by hallways 
(see Figure|3I. The state must travel from (0.3,0.5) to (0.6,0.5), and the po- 
tential function V{x) simply measures the distance to the goal. For replanning 
we use a unidirectional RRT planner [9|| , which, like other sample-based plan- 
ners, is sensitive to the presence of narrow passages in the feasible space. We 
control the difficulty of escaping local minima by varying a parameter w, and 
set the hallway widths to w and the doorway widths to 2w. 

We measure performance as the overall time taken by the robot to reach 
the goal, averaged over 10 runs with different random seeds. If it cannot reach 
the goal after 120 s, we terminate the run and record 120 s as the running time. 
Experiments compared performance over varying w for constant cutoffs 0.05, 
0.1, 0.2, 0.5, 1, 2, and 5 s and the exponential backoff algorithm starting with 
Z\i =0.1 (performance was relatively insensitive to choice of Ai). 

Figure H] plots the performance ratios of several strategies. Performance 
ratio is measured relative to the best constant time step for that problem. 
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Fig. 5 A Staiibli TX90L manipulator is commanded in real time to move its end 
effector in a clockwise circle in a cluttered environment. The robot responds re- 
actively to the target's motion. Along the upper semicircle, rapid replanning with 
a short time step allows the target to be followed closely. When obstacles are en- 
countered on the lower semicircle, planning becomes more difficult. Adaptive time 
stepping gives the planner sufficient time to enter and escape deep narrow pas- 
sages. The current plan is drawn in orange, and its destination configuration is 
drawn transparently. 

Shorter cutoffs are unreliable on hard problems because the planner is unable 
to construct paths that escape the local minimum of the initial room. On 
the other hand, longer cutoffs waste time on easier problems. The adaptive 
strategy delivers consistent performance, performing no worse than 1.4 times 
that of the best constant cutoff across all problems. 



3.5 Assisted Teleoperation Experiments on a 6DOF 
Manipulator 

Replanning interleaves planning and execution, so motion appears more fluid 
than a pre-planning approach. This is advantageous in human-robot inter- 
action and assisted teleoperation applications where delays in the onset of 
motion may be viewed as unnatural. We implemented a teleoperation system 
for a dynamically simulated 6D0F Staiibli TX90L manipulator that uses re- 
planning for real-time obstacle avoidance in assisted control. The robot is able 
to reject infeasible commands, follow commands closely while near obstacles, 
and does not get stuck in local minima like potential field approaches. 

In this system, an operator controls a 3D target point (for example, using 
a joystick or a laser pointer) , and the robot is instructed to reach the point 
using its end effector. The robot's state space consists of configuration x 
velocity, and its acceleration and velocity are bounded. Its configuration are 
subject to joint limit and collision constraints. The objective function for the 
planner is an unpredictably time- varying function V{x, t) which measures the 
distance from the end effector to the target point. 
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Fig. 6 Traces of the end effector's desired position (Desired), the position at the 
current plan's destination configuration (Destination), and actual position as exe- 
cuted by the robot (Actual) for the experiment in Figure [S] 



Our underlying planner is a unidirectional variant of the SBL motion plan- 
ner [l5| that is adapted to produce dynamically feasible paths. We made the 
following adjustments to the basic algorithm: 

• We extend the search tree by sampling extensions to stationary config- 
urations sampled at random. The local planner constructs dynamically 
feasible trajectories that are optimal in obstacle free environments (a sim- 
ilar strategy was used in [5]). To do so, we use analytically computed 
trajectories that are time-optimal under the assumption of box-bounds on 
velocity and acceleration _6j. 

• For every randomly generated sample, we generate a second configuration 
using an inverse kinematics solver in order to get closer to the target. 

• SBL uses a lazy collision checking mechanism that improves planning time 
by delaying edge feasibility checks, usually until a path to the goal is found. 
We delay edge checks until the planner finds a path that improves C{y). 

• To improve the fluidity of motion, we devote 20% of each time step to 
trajectory smoothing. We used the shortcutting heuristic described in [6| 
that repeatedly picks two random states on the trajectory, constructs a 
time-optimal segment between them, and replaces the intermediate portion 
of the trajectory if the segment is collision free. 

The simulation environment is based on the Open Dynamics Engine rigid- 
body simulation package, where the robot is modeled as a series of rigid links 
controlled by a PID controller with feedforward gravity compensation and 
torque limits. The simulation does perform collision detection, but in our 
experiments the simulated robot did not collide with the environment. 

We simulated a user commanding a target to follow a circular trajectory 
that passes through the robot and obstacles (Figure O- The circle has radius 
0.8 m and a period of 20 s. The upper semicircle is relatively unconstrained 
and can be followed exactly. Targets along the lower semicircle are signifi- 
cantly harder to reach; at several points they pass through obstacles, and at 
other points they require the robot to execute contorted maneuvers through 
narrow passages in the feasible space. Experiments show that replanning can 
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reach a large portion of the lower semicircle while tracking the upper semi- 
circle nearly perfectly (Figure [HI- 



4 Replanning in Unpredictable Environments 

A conservative approach to uncertainty may be preferred in safety-critical 
applications like transportation and medical robotics. This section presents 
a real-time contingency planning algorithm that generates both optimistic 
(goal seeking) and pessimistic (safety seeking) trajectories to balance safety- 
seeking and goal-seeking behavior. Adaptive time stepping allows for a high 
probability of replanning before a certain time limit — the time to potential 
failure, or TTPF — in which safety is guaranteed. Experiments evaluate the 
system in a pursuit-evasion scenario. 

4-1 Conservative Replanning Framework 

We assume that we have access to conservative bounds on the uncertainty of 
the environment, and let E^ denote the environment model estimated by the 
robot's sensors at fc'th time step. Let F{t; Ek) denote the feasible set with the 
current model, and let F{t; Ek) denote the set of states that is guaranteed to 
be feasible at time t under the conservative uncertainty bounds. For example, 
if obstacle velocities are bounded, then one can consider a conservative space- 
time "cone" of possible obstacle positions that grows as time increases. 
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Fig. 7 Snapshots taken at half-second intervals from a pursuit-evasion experiment 
in the unit square. Two pursuers (grey circles) seek the evader (green) greedily at 
half the speed of the evader. The evader knows the pursuers' velocity bound but not 
their behavior. The evader replans a pessimistic path (red) to avoid the pursuers in 
the worst case, and replans an optimistic path (cyan) in order to reach the goal in 
the center of the room. Both paths share a common prefix. The trace of the robot 
between frames is drawn as a purple trail. 
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Consider a purely safety-seeking robot that uses the following scheme: 

1. The current trajectory y{t) has a time to potential failure (TTPF) T if 
it is safe for some duration T under conservative bounds on uncertainty. 
That is, yit) £ F(i; Ek) for all t e [tk, tk + T]. 

2. Replanning searches for a path y that increases the TTPF to T + Ak or 
some constant Tmim whichever is lower. 

It is straightforward to use Algorithm 1 to implement such behavior simply 
by using the TTPF as an optimization criterion. The robot will remain safe 
unless replanning cannot improve the TTPF within the duration T (and even 
then, a constraint violation only happens in the worst casejj. A violation may 
occur if 1) no trajectory that improves the TTPF exists, in which case the 
planner can do nothing except hope that the potential hazard goes away, or 
2) not enough planning time was devoted to finding a safe path. 

The risk of condition (2) is somewhat mitigated by the selection of the 
parameter Tmim which governs an "acceptable" threshold for the TTPF. 
Below this threshold, the planner enforces that subsequent pessimistic plans 
must increase the TTPF. Naturally, if safety were the robot's only objective, 
the best approach is to set Tmin to be infinite. In the below section, a finite 
Train will allow it to make optimistic progress toward a target while being 
acceptably confident that safety will be ensured. 

If the robot must also seek to optimize an objective V{x)^ it must sacrifice 
some safety in order to do so. Below we describe a contingency planning 
framework where the robot's path has a similarly high probability of safety 
as the above scheme, but the planner seeks to simultaneously increase the 
TTPF and makes progress towards reducing V{x). 



4-2 A Contingency Replanning Algorithm 

In our contingency planning algorithm, the robot maintains both an opti- 
mistic and a pessimistic trajectory that share a common prefix (Figure [7| . 
The role of the pessimistic trajectory is to optimize the TTPF, while the 
role of the optimistic trajectory is to encourage consistent progress toward 
the goal. 

Pseudocode is listed in Figure [H The pessimistic trajectory y{t) is main- 
tained and followed by default. The optimistic trajectory y°(i), if it exists, is 
identical to y{t) until the "junction" time tj. Each iteration of the replanning 
loop begins by establishing time limits for the optimistic and the pessimistic 
planners, with sum A^ (Line 2). Then a top-level decision is made whether 
to initiate the new plan from the optimistic or the pessimistic trajectory: 



^ A major benefit of sample-based replanning is that holding TTPF constant, a 
factor n increase in computational speed results in a sharp reduction in failure 
rate from p to p". 
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Algorithm 2. Contingency Replanning with Adaptive Time Steps 

Initialization: 

Oa. y(t) ^- an initial trajectory starting from i = 0. 

Ob. y°{t) ^ nil. 

Oc. Junction time tj ^- 

Repeat for k = 1, . . .: 

1. Measure the current time t^. 

2. Pick a pessimistic and optimistic time limit A^ and A^. Let Ak — A^ + A1 

3. If tj > tk + Ak (branch the new plan from the optimistic path) 

4. Plan an improved optimistic path starting from yo{tj). 

5. Plan a pessimistic path y starting from yoitj + Ak)- 

6. If Line 5 is successful, then 

7. Set y{t) ^ yo{t) for t < tj + A^, and y{t) ^ y{t) for t > tj + Ak. 

8. Set tj «- tj + Ak- 

9. End 

10. Otherwise, (branch the new plan from the pessimistic path) 

11. Plan an optimistic path starting from y{tk + At). 

12. If successful, then 

13. Plan a pessimistic path y starting from yo{tk + 2Zifc). 

14. If successful, then 

15. Set y{t) ^ yo(t) for ife + Z\fc < t < tfc + 2Ak, and y{t) <- y{t) for 
t>tk + 2Ak. 

16. Settj ^tk + 2Ak■ 
n. End 

18. Otherwise, 

19. Plan a pessimistic path y starting from y{tk + Ak)- 

20. If successful, set y{t) ^- y{t) for t > tk + Ak- 



Fig. 8 Pseudocode for the contingency replanning algorithm. 

• From the optimistic trajectory (Lines 4-9). To continue progress along 
y° after time tj, the robot must generate a pessimistic trajectory that 
branches out of y° at some time after tj . An improvement to the optimistic 
plan is attempted as well. 

• From the pessimistic trajectory (Lines 11-20). To progress toward the tar- 
get, the planner will attempt to branch a new pessimistic and optimistic 
pair out of the current pessimistic trajectory at time tk + Ak- The new 
junction time will be tk + 2Ak- If this fails, the planner attempts an ex- 
tension to the pessimistic path. 

To improve the optimistic path, the planner constructs a path in the op- 
timistic feasible space F{t] tc) based on the current environment model. A 
query is deemed successful if, after time limit Z\JJ, C{y°) is improved over the 
current optimistic path if it exists, or otherwise over the current pessimistic 
path. If the query fails, y° is left untouched. Pessimistic queries are handled 
exactly as in the prior section. 
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Fig. 9 Pursuit-evasion environments 1 and 2. Narrow passages, and lience, diffi- 
culty, are parameterized by w. The evader (green) must try to reach the target 
(red) within 10 s while avoiding the pursuer (blue), with capture radius 0.05. 
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Fig. 10 (a) Survival time and (b) success rates for evader time-stepping strategies 
on problem 1. Results were averaged over ten trials on each passage width. The 
adaptive strategy (E.B.) performs as well as the best constant cutoff, and is more 
consistent across problem variations. 



To choose planning times, we again use an adaptive time stepping scheme 
using the exponential backoff strategy of Section 13.21 Pessimistic and opti- 
mistic planning times are learned independently. We also make adjustments 
in case the candidate time step exceeds the finite TTPF of our paths. First, 
if we find that Ak exceeds the TTPF T of the pessimistic path, that is, fail- 
ure may occur before planning is complete, we set Z\^ = T/2 and Z\^ = 0. 
Second, if we are attempting a modification to the optimistic trajectory, and 
the TTPF of the optimistic trajectory T° is less than tj + Ak, then we scale 
A^ and A'^ to attempt a replan before T° (otherwise, the pessimistic replan 
is guaranteed to fail). 



4-3 Experiments on a Pursuit- Evasion Example 

Our experiments evaluate how contingency planning strategies affect an 
evader's performance in a planar pursuit-evasion scenario. The evader's 
goal is to reach a target within 10 s before being captured by a pursuer. 
The evader and pursuer move at maximum speeds 1 and 0.5, respectively. 
The evader treats the pursuer as an unpredictable obstacle with bounded 



Adaptive Time Stepping in Real-Time Motion Planning 



153 




0.01 0.02 0.05 0.1 
Passage width 



0.01 0.02 0.05 0.1 
Passage Width 



t Cutoff 0.05 
■ Cutoff 0.1 
A Cutoff 0.2 
)( Cutoff 0.5 
• ••• E.B. 



Fig. 11 Success rates for evader time-stepping strategies on problem 2 for a (a) 
nonadversarial and (b) adversarial pursuer behaviors. In the nonadversarial case the 
pursuer is allowed to pass through obstacles. A shorter time step (Cutoff 0.2) per- 
forms well in the nonadversarial case, but a longer time step (Cutoff 0.5) performs 
better in the adversarial case. The adaptive strategy works well in both cases. 



velocity, and uses Algorithm 2 with Tmin = 1-0. The evader's conservative 
model of F{t,Ek) does not consider walls to be impediments to the pur- 
suer's possible movement. The pursuer treats the evader as an unpredictably 
moving target, and uses Algorithm 1 to reach it. 

Holding the pursuer's behavior constant, we varied the environment diffi- 
culty and evader's time stepping strategy on Problem 1 (Figure [5Ja)). Here 
the pursuer begins in a room with the evader, which must escape through a 
narrow passage to reach the target in a second room. Figure [TU] shows that 
narrow passage width does not affect the evader's survival much, but rather, 
responsiveness is more important to enable it to dance around an approaching 
pursuer. The adaptive time strategy appropriately finds short time steps. 

Next, we considered a more difficult environment. Problem 2 (FigurelHJb)). 
Mere survival is not challenging (in all experiments it was over 90%), but 
reaching the target is; success requires the evader to choose a different hall- 
way than the pursuer. We tested a nonadversarial pursuer behavior in which 
it "wanders" with velocity varying according to a random walk, and is allowed 
to pass through walls. Figure [TTT a) shows that in this case, the success rate 
is highly dependent on problem difficulty, and no constant cutoff performs 
uniformly well across all width variations. Similar variations were found us- 
ing an adversarial pursuer (Figure [TTJb)). The adaptive strategy performed 
nearly as well as the best constant cutoff across all problem variations. 



5 Conclusion 

The runtime variance of planning queries has been a major impediment 
to the adoption of replanning techniques in real-time robot control. This 
paper addresses this problem by introducing two adaptive time-stepping 
algorithms - a simple one for deterministic environments, and a more 
complex one for nondeterministic environments ~ that tolerate run-time 
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variance by learning a time step on-thc-fly. Experiments on shared control 
for an industrial robot arm and on pursuit-evasion examples suggest that 
replanning may be a viable mechanism for real-time navigation and obstacle 
avoidance. Additional videos of our experiments can be found on the web at 
http://www.iu.edu/motion/realtime.html. 
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The Bayes Tree: An Algorithmic Foundation for 
Probabilistic Robot Mapping 

Michael Kaess, Viorela Ila, Richard Roberts, and Frank Dellaert 



Abstract. We present a novel data structure, the Bayes tree, that provides an al- 
gorithmic foundation enabling a better understanding of existing graphical model 
inference algorithms and their connection to sparse matrix factorization methods. 
Similar to a clique tree, a Bayes tree encodes a factored probability density, but 
unlike the clique tree it is directed and maps more naturally to the square root in- 
formation matrix of the simultaneous localization and mapping (SLAM) problem. 
In this paper, we highlight three insights provided by our new data structure. First, 
the Bayes tree provides a better understanding of batch matrix factorization in terms 
of probability densities. Second, we show how the fairly abstract updates to a ma- 
trix factorization translate to a simple editing of the Bayes tree and its conditional 
densities. Third, we apply the Bayes tree to obtain a completely novel algorithm for 
sparse nonlinear incremental optimization, that combines incremental updates with 
fluid relinearization of a reduced set of variables for efficiency, combined with fast 
convergence to the exact solution. We also present a novel strategy for incremental 
variable reordering to retain sparsity. We evaluate our algorithm on standard datasets 
in both landmark and pose SLAM settings. 

Keywords: graphical models, clique tree, probabilistic inference, sparse linear al- 
gebra, nonlinear optimization, smoothing and mapping, SLAM, iSAM. 



1 Introduction 

Probabilistic inference algorithms are important in robotics for a number of appli- 
cations, ranging from simultaneous localization and mapping (SLAM) for building 
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geometric models of the world, to tracking people for human robot interaction. Our 
research is mainly in large-scale SLAM and hence we will use this as an example 
throughout the paper. SLAM is a core competency in mobile robotics, as it provides 
the necessary data for many other important tasks such as planning and manipu- 
lation, in addition to direct applications such as 3D modeling, exploration, and re- 
connaissance. The uncertainty inherent in sensor measurements makes probabilistic 
inference algorithms the favorite choice for SLAM. And because online operation 
is essential for most real applications, efficient incremental online algorithms are 
important and are at the focus of this paper. 

Taking a graphical model perspective to probabilistic inference in SLAM has a 
rich history JQi and has especially led to several novel and exciting developments 
in the last years |l27l[l0l[nj[l2l[n][3Tl- Paskin proposed the thin junction tree fil- 
ter (TJTF) ll27l . which provides an incremental solution directly based on graphical 
models. However, filtering is applied, which is known to be inconsistent when ap- 
plied to the inherently nonlinear SLAM problem 1201 . i.e., the average taken over a 
large number of experiments diverges from the true solution. In contrast, /m// SLAM 
Il34l retains all robot poses and can provide an exact solution, which does not suffer 
from inconsistency. Folkesson and Christensen presented Graphical SLAM ifTOl . a 
graph-based full SLAM solution that includes mechanisms for reducing the com- 
plexity by locally reducing the number of variables. More closely related, Frese's 
Treemap lfT2l performs QR factorization within nodes of a tree that is balanced over 
time. Sparsification is applied to prevent nodes from becoming too large, introduc- 
ing approximations by duplication of variables. 

The sparse linear algebra perspective has been explored by Dellaert et al. ||6l |7] 
l23l in Smoothing and Mapping (SAM), an approach that exploits the sparsity of 
the smoothing information matrix. The matrices associated with smoothing are typ- 
ically very sparse, and one can do much better than the cubic complexity associated 
with factorizing a dense matrix 11241. Kaess et al. Il22l |23]| proposed incremental 
smoothing and mapping (iSAM), which performs /ai'f incremental updates of the 
square root information matrix, yet is able to compute the full map and trajectory at 
any time. New measurements are added using matrix update equations ifTSllTSlfTTl . 
so that previously calculated components of the square root information matrix are 
reused. However, to remain efficient and consistent, iSAM requires periodic batch 
steps to allow for variable reordering and relinearization, which is expensive and 
detracts from the intended online nature of the algorithm. 

To combine the advantages of the graphical model and sparse linear algebra per- 
spective, we propose a novel data structure, the Bayes tree. Our approach is based 
on viewing matrix factorization as eliminating a factor graph into a Bayes net, which 
is the graphical model equivalent of the square root information matrix. Performing 
marginalization and optimization in Bayes nets is not easy in general. However, a 
Bayes net resulting from elimination/factorization is chordal, and it is well known 
that a chordal Bayes net can be converted into a tree-structured graphical model in 
which these operations are easy. The most well-known such data structure is the 
clique tree ll30l [Tl. also known as the junction tree in the Al literature Q, which has 
already been exploited for distributed inference in SLAM ll8l [27]| . However, the new 
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data structure we propose here, the Bayes tree, is directed and corresponds more 
naturally to the result of the QR factorization in linear algebra, allowing us to an- 
alyze it in terms of conditional probability densities in the tree. We further show 
that incremental inference corresponds to a simple editing of this tree, and present a 
novel incremental variable ordering strategy. 

Exploiting this new data structure and the insights gained, we propose a novel in- 
cremental exact inference method that allows for Incremental reordering and 
just-in-time relinearization. To the best of our knowledge this is a completely 
novel approach to providing an efficient and exact solution to a sparse nonlin- 
ear optimization problem in an incremental setting, with general applications be- 
yond SLAM. While standard nonlinear optimization methods repeatedly solve a 
linear batch problem to update the linearization point, our Bayes tree-based algo- 
rithm allows fluid relinearization of a reduced set of variables which translates into 
higher efficiency, while retaining sparseness and full accuracy. We compare our new 
method to iSAM using multiple publicly available datasets in both landmark and 
pose SLAM settings. 

2 Problem Statement 

We use & factor graph ll25l to represent the SLAM problem in terms of graphical 
models. Formally, a factor graph is a bipartite graph G — (J^, 0, (?) with two node 
types: factor nodes fi^.'P and variable nodes Oj G 0. Edges etj £ to are always 
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Fig. 1 (top) The factor graph and the associated Jacobian matrix A for a small SLAM exam- 
ple, where a robot located at successive poses xi , X2, and X3 makes observations on landmarks 
1 1 and I2. In addition there is an absolute measurement on the pose xi. (bottom) The chordal 
Bayes net and the associated square root information matrix R resulting from eliminating 
the factor graph using the elimination ordering /[, I2, xi, X2, X3. Note that the root, the last 
variable to be eliminated, is shaded darker. 
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Algorithm 1. Eliminating a variable 0, from the factor graph. 

1. Remove from the factor graph all factors /,(0,) that are adjacent to dj. Define the sepa- 
rator Sj as all variables involved in those factors, excluding dj. 

2. Form the (unnormalized) joint density fjoi„,{Qj,Sj) = WifiiOj) as the product of those 
factors. 

3. Using the chain rule, factorize the joint density fjoj„i{Oj,Sj) = P{dj\Sj)f„i,w{Sj)- Add the 
conditional P{dj\Sj) to the Bayes net and the factor f„ew{Sj) back into the factor graph. 



between factor nodes and variables nodes. A factor graph G defines the factorization 
of a function /(0) as 

/(0)=n./;(0<) (1) 

where 0, is the set of variables 6j adjacent to the factor fi, and independence rela- 
tionships are encoded by the edges e,-^: each factor fi is a function of the variables 
in 0,. An example of a SLAM factor graph is shown in Fig. [H top). 
When assuming Gaussian measurement models 

/;(0,)-exp(^-l||/;,(0<)-Z/|lz,) (2) 

as is standard in the SLAM literature Il32l l3ll9l, the factored objective function we 
want to maximize ([T]l corresponds to the nonlinear least-squares criterion 

argmin(-log/(0)) = avgmm^'^\\h,i0i) ~ ZiW^ (3) 

where hi{©i) is a measurement function and z/ a measurement, and ||e|| J ~ e^E^^e 
is defined as the squared Mahalanobis distance with covariance matrix E. 

A crucial insight is that inference can be understood as converting the factor 
graph to a Bayes net using the elimination algorithm. Variable elimination |[T]|4| 
originated in order to solve systems of linear equations, and was first applied in 
modern times by Gauss in the early 1800s llT4l . 

In factor graphs, elimination is done via a bipartite elimination game, as de- 
scribed by Heggernes and Matstoms |fT9]| . This can be understood as taking apart 
the factor graph and transforming it into a Bayes net 1291 . One proceeds by elimi- 
nating one variable at a time, and converting it into a node of the Bayes net, which is 
gradually built up. After eliminating each variable, the reduced factor graph defines 
a density on the remaining variables. The pseudo-code for eliminating a variable 
Qj is given in Algorithm [1] After eliminating all variables, the Bayes net density is 
defined by the product of the conditionals produced at each step: 

p{e) = X{p{dj\Sj) (4) 

The result of this process for the example in Fig. [TJ top) is shown in Fig.lHbottom). 
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3 The Bayes Tree 



The Bayes net resulting from elimination/factorization is chordal, and it can be con- 
verted into a tree- structured graphical model in which optimization and marginal- 
ization are easy. In this paper we introduce a new data structure, the Bayes tree, to 
better capture the equivalence with linear algebra and enable new algorithms in re- 
cursive estimation. A Bayes tree is a directed tree where the nodes represent cliques 
Cji of the underlying chordal Bayes net. In this respect Bayes trees are similar to 
clique trees, but a Bayes tree is directed and is closer to a Bayes net in the way it en- 
codes a factored probability density. In particular, we define one conditional density 
P{Fk\Sk) per node, with the separator S^ as the intersection Q n -0^. of the clique Q 
and its parent clique U^, and the, frontal variables F^ as the remaining variables, i.e. 

Fk = Ck\Sk- We write Ck = Fk : Sk- This leads to the following expression for the 
joint density P{0) on the variables defined by a Bayes tree, 



P{0) 



YlPiFklSk) 

k 



(5) 



where for the root Fr the separator is empty, i.e., it is a simple prior P{Fr) on the 
root variables. The way Bayes trees are defined, the separator Sk for a clique Q is 
always a subset of the parent clique Tlk, and hence the directed edges in the graph 
have the same semantic meaning as in a Bayes net: conditioning. 

Every chordal Bayes net can be transformed into a tree by discovering its cliques. 
Discovering cliques in chordal graphs is done using the maximum cardinality search 
algorithm by Tarjan and Yannakakis 1331 . which proceeds in reverse elimination 
order to discover cliques in the Bayes net. The algorithm for converting a Bayes net 
into a Bayes tree is summarized in Algorithm [2] and the corresponding Bayes tree 
for the small SLAM example in Fig.[T|is shown in Fig. [21 

Gaussian Case. In practice one always considers a linearized version of problem 
(O. If the measurement models hi in equation (O are nonlinear and a good lineariza- 
tion point is not available, nonlinear optimization methods such as Gauss-Newton 
iterations or the Levenberg-Marquardt algorithm solve a succession of linear ap- 
proximations to (O in order to approach the minimum. 
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Fig. 2 The Bayes tree and the associated square root information matrix R describing the 
clique structure in the Bayes net from Fig. [T] A Bayes tree is similar to a clique tree, but 
is better at capturing the formal equivalence between sparse linear algebra and inference in 
graphical models. The association of cliques with rows in the R factor is indicated by color. 
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Algorithm 2. Creating a Bayes tree from the chordal Bayes net resuhing from ehm- 

ination (Algorithm[T]l- 

For each conditional density P{6j\Sj) of the Bayes net, in reverse elimination order: 

If no parent (Sj = {}) 

start a new root clique F,- containing dj 
else 

identify parent clique Cp that contains the first eliminated variable of Sj as a frontal 
variable 

if nodes Fp U Sp of parent clique Cp are equal to separator nodes Sj of conditional 

insert conditional into clique Cp 
else 

start new clique C' as child of Cp containing dj 



At each iteration of the nonlinear solver, we linearize around a linearization point 
to get a new, linear least-squares problem in x with the objective function 

-log/(x) = l||Ax-bf (6) 

where A G K"'*^" is the measurement Jacobian consisting of m measurement rows 
and X is an n-dimensional tangent vector of a minimal representation IfTSll . Note 
that the covariances Z,- have been absorbed into the corresponding block rows of A, 



T „ 1 
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"2X 



2 



making use of ||x|||; = x' U 'x = x' Z ^U ^x = Z Jx . The matrix A above 
is a sparse block-matrix, and its graphical model counterpart is a Gaussian factor 
graph with exactly the same structure as the nonlinear factor graph, see Fig-IU The 
probability density on x defined by this factor graph is the normal distribution 

P(x) oc e-i°g/W ^ exp |-- ||Ax - bf j (7) 

In Gaussian factor graphs, elimination is equivalent to sparse QR factoriza- 
tion of the measurement Jacobian. In Gaussian factor graphs, the chain rule 
fjomt{dj,Sj) = P{Oj\Sj)fnew{Sj) in Step 3 of Algorithm[T]can be implemented using 
Householder reflections or a Gram-Schmidt orthogonalization, in which case the 
entire elimination algorithm is equivalent to QR factorization of the entire measure- 
ment matrix A. To see this, note that, for Xj G M and s^- G K.' (the set of variables Sj 
combined in a vector of length /), the factor fjomt{xj,Sj) defines a Gaussian density 

fjomr{xj,Sj) oc exp<^ -- ||a(r;+A5Sy -b||^ I (8) 

where the dense, but small matrix A, = [a|As] is obtained by concatenating the vec- 
tors of partial derivatives of all factors connected to variable Xj. Note that a G M^, 
As G R^^' and b G M*^, with k the number of measurement rows of all factors 
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connected to Xj. The desired conditional P{xj\sj) is obtained by evaluating the joint 
((S]) for a given value of Sy, yielding 



P{xj\sj) oc exp I - - {xj + rsy - df \ 



(9) 



with r = a^ A5 and J = a^b, where a^ = (a^a) a^ is the pseudo-inverse of a. The 
new factor fnew{Sj) is obtained by substituting Xj = d — rSj back into ^: 



fnew{Sj)=S^pl --||A'S;--b'f [ 



(10) 



where A' = Aj — ar and b' = b — ad. The above is one step of Gram-Schmidt, inter- 
preted in terms of densities, and the sparse vector r and scalar d can be recognized 
as specifying a single joint conditional density in the Bayes net, or alternatively a 
single row in the sparse square root information matrix as indicated in Fig. [21 

Solving. The optimal assignment x* of the linear least-squares solution is the 
one that maximizes the joint density P{x) from (O. The optimal assignment x* can 
be computed in dynamic programming style in one pass from the leaves up to the 
root of the tree to define all functions, and then one pass down to retrieve the optimal 
assignment for all frontal variables, which together make up the variables x. The first 
pass is already performed during construction of the Bayes tree, and is represented 
by the conditional densities associated with each clique. The second pass recovers 
the optimal assignment starting from the root based on Q by solving 

Xj=d-rSj (11) 

for every variable .xy, which is equivalent to backsubstitution in sparse linear algebra. 

4 Incremental Inference 

We show that incremental inference corresponds to a simple editing of the Bayes 
tree, which also provides a better explanation and understanding of the otherwise 
abstract incremental matrix factorization process. In particular, we will now store 
and compute the square root information matrix R in the form of a Bayes tree ^. 
Incremental factorization/inference is performed by reinterpreting the top part of 
the Bayes tree again as a factor graph, adding to this the new factors, creating with 
a new elimination order a new Bayes tree from this "top", then reattaching to it 
the unaffected subtrees. When a new measurement is added, for example a factor 
f'{xj,Xji), only the paths between the cliques containing Xj and Xf (respectively) 
and the root are affected. The sub-trees below these cliques are unaffected, as are 
any other sub-trees not containing xy orxf. Fig.[3]shows how these steps are applied 
to our small SLAM example (originally in Fig.|2ll. The upper-left shows that adding 
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Fig. 3 Updating a Bayes tree with a new factor, based on the example in Fig. |2] (top left) 
The affected part of the Bayes tree is highlighted for the case of adding a new factor between 
Ji and J3. Note that the right branch is not affected by the change, (top right) The factor 
graph generated from the affected part of the Bayes tree, (bottom left) The chordal Bayes net 
resulting from eliminating the factor graph, (bottom right) The Bayes tree created from the 
chordal Bayes net, with the unmodified right "orphan" subtree from the original Bayes tree 
added back in. 



Algorithm 3. Updating the Bayes tree with new factors ^ . 

In: Bayes tree S/' , new linear factors ,^' 
Out: modified Bayes tree S/'' 

1. Remove top of Bayes tree and re-interpret it as a factor graph: 

a. For each affected variable, remove the corresponding clique and all parents up to the 
root. 

b. Store orphaned sub-trees S/'oi-ph of removed cliques. 

2. Add the new factors J?' into the resulting factor graph. 

3. Re-order and eliminate the factor graph into a Bayes net (Algorithm[TJ, and re-assemble 
into a new Bayes tree (Algorithmic- 

4. Insert the orphans S/'orph back into the new Bayes tree. 



the new factor between x\ and .q only affects the left branch of the tree. The entire 
process of updating the Bayes tree with a new factor is described in Algorithm[3] 
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To understand why only the top part of the tree is affected, we look at two impor- 
tant properties of the Bayes tree. These directly arise from it encoding information 
flow during elimination. First, during elimination, variables in each clique collect 
information from their child cliques via the elimination of these children. Thus, 
information in any clique propagates only upwards to the root. Second, the infor- 
mation from a factor enters elimination only when the first variable of that factor is 
eliminated. 

Combining these two properties, we see that a new factor cannot influence any 
other variables that are not successors of the factor's variables. However, a factor on 
variables having different (i.e. independent) paths to the root means that these paths 
must now be re-eliminated to express the new dependency between them. 

5 Incremental Reordering 

Choosing the right variable ordering is essential for the efficiency of a sparse ma- 
trix solution, and this also holds for the Bayes tree approach. An optimal ordering 
minimizes the fill-in, which refers to additional entries in the square root informa- 
tion matrix that are created during the elimination process. In the Bayes tree, fill-in 
translates to larger clique sizes, and consequently slower computations. Fill-in can 
usually not be completely avoided, unless the original Bayes net already is chordal. 
Finding the variable ordering that leads to the minimal fill-in is NP-hard. One typi- 
cally uses heuristics such as the column approximate minimum degree (COLAMD) 
algorithm by Davis et al. Q, which provide close to optimal orderings for many 
problems. 
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Fig. 4 For a trajectory with loop closing, two different optimal variable orderings based on 
nested dissection are shown in the top row, with the corresponding Bayes tree structure in 
the bottom row. For the incremental setting the left choice is preferable, as the most recent 
variables end up in the root, minimizing work in future updates. 
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While performing incremental inference in the Bayes tree, variables can be re- 
ordered at every incremental update, eliminating the need for periodic batch re- 
ordering. This was not understood in 1231 . because this is only obvious within the 
graphical model framework, but not for matrices. Reordering is only performed for 
the variables affected by the new factors. Finding an optimal ordering for this subset 
of variables does not necessarily provide an optimal overall ordering. However, we 
have observed that some incremental orderings provide good solutions, comparable 
to batch application of COLAMD. 

One particularly good ordering forces the affected variables to be eliminated last. 
This strategy provides a good ordering because new measurements almost always 
connect to recently observed variables. In particular, odometry measurements al- 
ways connect to the previous pose. In the exploration mode it is clear that if the 
most recent variables end up in the root, only a small part of the tree (optimally 
only the root) has to be reorganized in the next step. The more difficult case of a 
loop closure is visualized in Fig.|4] In the case of a simple loop, nested dissection 
provides the optimal ordering. The first cut can either (a) include the root, or (b) 
not include the root, and both solutions are equivalent in terms of fill-in. However, 
there is a significant difference in the incremental case: For the horizontal cut that 
does not include the most recent variable t^, that variable will end up further down 
in the tree, requiring larger parts of the tree to change in the next update step. The 
vertical cut, on the other hand, includes the last variable in the first cut, pushing it 
into the root, and therefore leading to smaller, more efficient changes in the next 
step. In order to deal with more general topologies than this simple example, we 
use a constrained version of the COLAMD algorithm, that allows keeping the last 
variables in the root while still obtaining a good overall ordering. 

6 Exact Incremental Inference with Fluid Relinearization 

In this section we use the Bayes tree in a novel algorithm for optimizing a set of non- 
linear factors that grows over time, which is directly applicable to online mapping. 
We have already shown how the Bayes tree is updated with new linear factors. We 
now discuss how to perform relinearization where needed, a process that we call 
fluid relinearization. Then we present a combined algorithm for adding nonlinear 
factors over time, while keeping the Bayes tree and the estimate up-to-date. 

The goal of our algorithm is to obtain an estimate for the variables (map and 
trajectory), given a set of nonlinear constraints that expands over time, represented 
by nonlinear factors ^. New factors ^' can arrive at any time and may add new 
variables Q' to the estimation problem. We take the most recent estimate as lin- 
earization point to solve a linearized system as a subroutine in an iterative nonlinear 
optimization scheme. The linearized system is represented by the Bayes tree ,'7 . 

Solving. When solving in a nonlinear setting, we obtain a delta vector A that is 
used to update the linearization point 0, as shown in Algorithm!!] Updates are often 
local operations that do not affect the solution of other parts of the map. Therefore 
we will consider variables unchanged for which the recovered delta changes by less 
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Algorithm 4. Solving the Bayes tree in the nonlinear case returns an update A that 

can be added to the current linearization point to obtain the current estimate for 

all variables © +A. 

In: Bayes tree 5^ 

Out: update A 

Starting from the root clique Cr = F,-: 

1 . For current clique C^ = Fj^ : S^ 

compute update A^- of frontal variables F^ using already computed values of parents Sjt 
and the local conditional density /'(Ft|5,i). 

2. For all variables Ai^j in Ai^ that change by more than a threshold a: 
recursively process each descendant containing such a variable. 



Algorithm 5. Fluid relinearization: The linearization points of select variables are 

updated based on the current delta A . 

In: nonlinear factors ^, linearization point 0, Bayes tree 5^, delta A 

Out: updated Bayes tree £^, updated linearization point 

1. Mark variables in A above threshold /3: 7 = {Aj e A\Aj> P}. 

2. Update linearization point for marked variables: 0j := 0j+Aj. 

3. Mark all cliques that involve marked variables 0j and all their ancestors. 

4. From the leaves to the top, if a clique is marked: 

a. Relinearize the original factors in ^ associated with the clique. 

b. Add cached marginal factors from any unmarked children. 

c. Re-eliminate. 



than a small threshold a. For a clique that does not contain any variables that are 
considered changed, the subtrees will not be traversed. To be exact, the different 
units of variables have to be taken into account, but one simple solution is to take 
the minimum over all thresholds. 

Fluid Relinearization. The idea behind just-in-time or fluid relinearization is 
to keep track of the validity of the linearization point for each variable, and only 
relinearize when needed. This represents a departure from the conventional lin- 
earize/solve approach that currently represents the state of the art, and can be viewed 
as a completely new algorithm for nonlinear optimization. For a variable that is cho- 
sen to be relinearized, all relevant information has to be removed from the Bayes tree 
and replaced by relinearizing the corresponding original nonlinear factors. Cliques 
that are re-eliminated have to take into account also the marginal factors that get 
passed up from subtrees. We cache those marginals during elimination to avoid hav- 
ing to re-eliminate unmarked cliques to obtain them. Algorithm|5]shows the overall 
fluid relinearization process. 

Now we have all components for a fully incremental nonlinear optimization al- 
gorithm that allows exact incremental inference for sparse nonlinear problems such 
as SLAM. The algorithm is summarized in Algorithm |6l and we provide a brief 
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Algorithm 6. Nonlinear iteration with incremental variable reordering and fluid re- 

linearization. 

In / out: Bayes tree S/' , linearization point 0, nonlinear factors ^ 
In: new nonlinear factors J?', new variables 0' 
Initialization: 5^ = 0, = 0, ^ = 

1. Add any new factors ^ := ^ U ^'. 

2. Initialize any new variables 0' and add := U 0'. 

3. Linearize new factors ^' to obtain J?/,„. 

4. Linear update step, applying Algorithm |3] to =^/,„. 

5. Solve for delta A with Algorithmic] 

6. Iterate Algorithm|5]until no more relinearizations occur 



discussion of its complexity here. We assume here that initialization is available and 
it is close enough to the global minimum to allow convergence - that is a general 
requirement of any direct solver method. The number of iterations needed to con- 
verge is typically fairly small, in particular because of the quadratic convergence 
properties of our algorithm near the minimum. For exploration tasks with a con- 
stant number of constraints per pose, the complexity is 0(1). In the case of loop 
closures the situation becomes more difficult, and the most general bound is that for 
full factorization, 0{rv'), where n is the number of variables (poses and landmarks if 
present). Under certain assumptions that hold for many SLAM problems, the com- 
plexity is bounded by 0(«''^) |!24l. It is important to note that this bound does not 
depend on the number of loop closings. It should also be noted that our incremental 
algorithm is often much faster than a full factorization, as we show below. 



7 Experimental Results 

This section describes the experiments that validate the presented approach, using 
both synthetic and real datasets that are publicly available. We compare our estima- 
tion and timing results with a state of the art incremental algorithm [23) in order 
to highlight the advantages of fluid relinearization and incremental reordering. We 
have implemented the batch and iSAM algorithms using the same Bayes tree li- 
brary to provide a comparison of the algorithms, rather than a comparison of differ- 
ent implementations. All results are obtained with a research C-n- implementation, 
running single-threaded on a laptop with Intel Core 2 Duo 2.2 GHz processor, and 
using the COLAMD algorithm by Davis et al. 15|- We use the thresholds a = 0.005 
and /3 = 0.05 for solving and relinearization, respectively. 

We evaluate the timing of our Bayes tree algorithm on the Victoria Park dataset, 
an often-used benchmark SLAM dataset ll23l[28ll courtesy of H. Durrani- Whyte and 
E. Nebot. This dataset includes 6969 laser scans with the corresponding odometry 
readings. The laser scans are processed to detect the trunks of the trees in the park, 
which are used as landmarks. Fig. |5] shows the final trajectory estimate together 
with the detected landmarks. In this figure the trajectory is colored according to 
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Fig. 5 The Victoria Park dataset (top) and the simulated Manhattan world dataset (bottom) 
after optimization, color coded with the number of variables that ai'e updated for every step 
along the trajectory. Green corresponds to a low number of variables, red to a high number. 



the number of variables our algorithm had to recalculate at each step, where green 
represents a small number of variables (order of 10), yellow a moderate number, 
and red finally a large number (order of hundreds of variables). A relatively small 
portion of the trajectory is colored red, mainly the part at the bottom where the 
vehicle closed loops multiple times, re-visiting the same location up to eight times. 
In Fig. |6l we compare per-step timing on the Victoria Park dataset between our 
algorithm and the original iSAM algorithm ll23l (both implemented using the same 
library as noted above). The results show that our fully incremental algorithm does 
not suffer from the periodic spikes in iSAM. Our algorithm also performs better in 
cumulative time, while providing the additional advantage of continuously updating 
the linearization point of all variables having significant changes. 

To evaluate the accuracy of our algorithm, we use the simulated Manhattan world 
from Il26l . courtesy of E. Olson. Fig.|7]shows that the normalized chi-square values 
follow the least-squares batch solution, providing a nearly exact solution in every 
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Fig. 6 Timing comparison for the Victoria Park dataset. The top row shows per step timing 
and the bottom row shows the cumulative time. Our new algorithm (red) provides an im- 
provement in speed over the original iSAM algorithm (blue), in addition to its advantages 
of eliminating periodic batch factorization and performing fluid relinearization. The original 
iSAM algorithm included a batch step every 100 iterations, which is clearly visible from the 
spikes. 
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Fig. 7 Comparison of normalized ;^ for the simulated Manhattan world. iSAM shows some 
spikes where it deviates from the least squares solution because relinearization is only per- 
formed every 100 steps. The Bayes tree solution is always very close to the least squares 
solution because of the fluid relinearization (j3 = 0.05). 



step. While ISAM also converges to the exact solution, it shows some spikes related 
to relinearization only being performed in the periodic batch steps. Final cumulative 
times for providing a full solution in every step are 19.4.S and 47. 65 for our algorithm 
and iSAM, respectively. Fig.[5]shows the estimated trajectory for the simulated Man- 
hattan world, again using the same color coding for the number of variables that had 
to be recalculated in each step. 

Finally, we evaluated timing results on the Intel dataset, courtesy of D. Haehnel 
and D. Fox. This dataset was preprocessed by laser scan-matching, resulting in a 
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pose graph formulation without landmarks, containing about 4000 poses. The final 
cumulative times are 44.4.5 and n2.6s for our algorithm and iSAM, respectively. 

8 Conclusion 

We have presented a novel data structure, the Bayes tree, that provides an algorith- 
mic foundation which enables new insights into existing graphical model inference 
algorithms and sparse matrix factorization methods. These insights have led us to 
a fully incremental algorithm for nonlinear least-squares problems as they occur in 
mobile robotics. We have used SLAM as an example application, even though the 
algorithm is also suitable for other incremental inference problems, such as object 
tracking and sensor fusion. Our novel graph-based algorithm should also allow for 
better insights into the recovery of marginal covariances, as we believe that sim- 
ple recursive algorithms in terms of the Bayes tree are formally equivalent to the 
dynamic programming methods described in pij. The graph based structure also 
provides a starting point for exploiting parallelization that is becoming available in 
newer processors. 
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Monte Carlo Value Iteration for 
Continuous-State POMDPs 

Haoyu Bai, David Hsu, Wee Sun Lee, and Vien A. Ngo 



Abstract. Partially observable Markov decision processes (POMDPs) have been 
successfully applied to various robot motion planning tasks under uncertainty. 
However, most existing POMDP algorithms assume a discrete state space, while 
the natural state space of a robot is often continuous. This paper presents Monte 
Carlo Value Iteration (MCVI) for continuous-state POMDPs. MCVI samples both 
a robot's state space and the corresponding belief space, and avoids inefficient a 
priori discretization of the state space as a grid. Both theoretical results and prelimi- 
nary experimental results indicate that MCVI is a promising new approach for robot 
motion planning under uncertainty. 



1 Introduction 

A challenge in robot motion planning and control is the uncertainty inherent in 
robots' actuators and sensors. Incorporating uncertainty into planning leads to much 
more reliable robot operation. 

Partially observable Markov decision processes (POMDPs) provide a principled 
general framework for modeling uncertainty and planning under uncertainty. Al- 
though POMDPs are computationally intractable in the worst case 1 13 1, point-based 
approximation algorithms have drastically improved the speed of POMDP plan- 
ning in recent years llT2l[T4l[T9ll20l . Today, the fastest POMDP algorithms, such as 
HSVI 1191 and SARSOP |T2"|, can solve moderately complex POMDPs with hun- 
dreds of thousands states in reasonable time. POMDPs have been used successfully 
to model a variety of robotic tasks, including navigation ||3l[T8l, grasping f8|, target 
tracking ||T0|[T4| . and exploration |19|. Most of the existing point-based POMDP 
algorithms, however, assume a discrete state space, while the natural state space for 
a robot is often continuous. Our primary goal is to develop a principled and practical 
POMDP algorithm for robot motion planning in continuous state spaces. 
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If the state space S is continuous, one common way of using existing POMDP 
algorithms would be to place a regular grid over S and construct a discrete POMDP 
model first. The difficulty with this approach is that the number of states grow ex- 
ponentially with the robot's degrees of freedom (DoFs), resulting in the "curse of 
dimensionality" well known in geometric motion planning (without uncertainty). 
The effect of a large number of states is in fact aggravated in POMDP planning. 
Due to uncertainty, the robot's state is not known exactly and is modeled as a belief, 
which can be represented as a probability distribution over S. We plan in the belief 
space B, which consists of all possible beliefs. The result of POMDP planning is 
a policy, which tells the robot how to act at any belief b E B. A standard belief 
representation is a vector b, in which an entry b{s) specifies the probability of the 
robot being in the discretized state s G S. The dimensionality of B is then equal to 
the number of states in the discrete POMDP model. 

Probabilistic sampling is a powerful idea for attacking the curse of dimension- 
ality ||23]| . In geometric motion planning, the idea of probabilistically sampling a 
robot's configuration space led to tremendous progress in the last two decades IQ. 
Similarly, in POMDP planning, a key idea of point-based algorithms is to sample a 
small set of points from the belief space B as an approximate representation of B 
rather than represent B exactly. However, this is not enough, if the robot's state space 
S is continuous. To compute a policy, we need to evaluate the effect of executing 
a sequence of actions with an initial belief b. Conceptually, we apply the sequence 
of actions to each state s E S and average the execution results with probabilistic 
weights b{s). It is clearly impossible to perform this computation exactly in finite 
time, as there are infinitely many states in a continuous state space S. 

In this paper, we propose Monte Carlo Value Iteration (MCVI) for continuous 
state POMDPs. MCVI samples both a robot's state space S and the corresponding 
belief space B, and avoids inefficient a priori discretization of the state space as a 
grid. The main technical innovation of MCVI is to use Monte Carlo sampling in 
conjunction with dynamic programming to compute a policy represented as a fi- 
nite state controller. We show that, under suitable conditions, the computed policy 
approximates the optimal policy with a guaranteed error bound. We also show pre- 
liminary results of the algorithm applied to several distinct robotic tasks, including 
navigation, grasping, and exploration. 

In the following, we start with some preliminaries on POMDPs and related work 
(Section |2ll. Next, we describe the main idea of MCVI and the algorithmic details 
(Section|3]l. We then present experimental results (SectionlHi. Finally, we conclude 
with some remarks on future research directions. 



2 Background 

2.1 Preliminaries on POMDPs 

A POMDP models an agent taking a sequence of actions under uncertainty to max- 
imize its total reward. In each time step, the agent takes an action a E A and moves 
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from a state s G S to s' G S, where S and A are the agent's state space and action 
space, respectively. Due to the uncertainty in actions, the end state s' is represented 
as a conditional probability function T{s, a, s') = p{s'\s, a), which gives the prob- 
ability that the agent lies in s', after taking action a in state s. The agent then takes 
an observation o G O, where O is the observation space. Due to the uncertainty in 
observations, the observation result is also represented as a conditional probability 
function Z{s' , a, o) = p(o|s', a) for s' € S and a G A. To elicit desirable agent be- 
havior, we define a reward function R{s, a). In each time step, the agent receives a 
real- valued reward R{s, a), if it is in state s e 5 and takes action a G A. The agent's 
goal is to maximize its expected total reward by choosing a suitable sequence of 
actions. When the sequence of actions has infinite length, we typically specify a 
discount factor 7 e (0, 1) so that the total reward is finite and the problem is well 
defined. In this case, the expected total reward is given by E(^^q 7*i?(st, at)), 
where St and at denote the agent's state and action at time t, respectively. 

The goal of POMDP planning is to compute an optimal policy tt* that maximizes 
the agent's expected total reward. In the more familiar case where the agent's state 
is fully observable, a policy prescribes an action, given the agent's current state. 
However, a POMDP agent's state is partially observable and modeled as a belief, 
i.e., a probability distribution over S. A POMDP policy tt: B ^ A maps a belief 
b & B to the prescribed action a G A. 

A policy TT induces a value function V^: B ^ R. The value of b with respect to tt 
is the agent's expected total reward of executing vr with initial belief b: 

oc 

V^{b)^E(j2l'R{^uat)\TT,by (1) 

t=o 

If the action space and the observation spaces of a POMDP are discrete, then the 
optimal value function V* can be approximated arbitrarily closely by a piecewise- 
linear, convex function IITSl : 

V{b) = max / a{s)b{s) ds, (2) 

""^^ Jses 

where each a g -T is a function over S and commonly called an a-function. If 
the state space is also discrete, we can represent beliefs and a-functions as vectors 
and replace the integral in (O by a sum. For each fixed a, h{b) = J2s£S Oiis)b{s) 
then defines a hyperplane over B, and V{b) is the maximum over a finite set of 
hyperplanes at b. In this case, it is clear why V{b) is piecewise-linear and convex. 

POMDP policy computation is usually performed offline, because of its high 
computational cost. Given a policy tt, the control of the agent's actions is performed 
online in real time. It repeatedly executes two steps. The first step is action selection. 
If the agent's current belief is b, it takes the action a — 7r(6), according to the given 
policy TT. The second step is belief update. After taking an action a and receiving an 
observation o, the agent updates its belief: 
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b'{s') ^ T{b,a,o) ^ riZ{s' ,a,o) / T{s,a,s')b{s) ds, (3) 

Jses 
where 77 is a normalizing constant. 

More information on POMDPs is available in lfTni22l . 



2.2 Related Work 

POMDPs provide a principled general framework for planning under uncertainty, 
but they are often avoided in robotics, because of their high computational com- 
plexity. In recent years, point-based POMDP algorithms made significant progress 
in computing approximate solutions to discrete POMDPs lfT2l [T4l [T9l [201. Their 
success hinges on two main ideas. First, they sample a small set of points from 
the belief space B and use it as an approximate representation of B. Second, they 
approximate the optimal value function as a set of a-vectors. The a-vectors allow 
partial policies computed at one belief point to be used for other parts of B when 
appropriate, thus bringing substantial gain in computational efficiency. 

In comparison, progress on continuous POMDPs has been much more limited, 
partly due to the difficulty of representing beliefs and value functions for POMDPs 
when high-dimensional, continuous state spaces are involved. As mentioned ear- 
lier, discretizing the state space with a regular grid often results in an unaccept- 
ably large number of states. One idea is to restrict beliefs and value functions to 
a particular parametric form, e.g., a Gaussian l(3l fT6l or a linear combination of 
Gaussians B UTSI . For robots in complex geometric environments with many ob- 
stacles, uni-modal distributions, such as the Gaussian, are often inadequate. In the- 
ory, a linear combination of Gaussians can partially address this inadequacy. How- 
ever, when the environment geometry contains many "discontinuities" due to ob- 
stacles, the number of Gaussian components required often grows too fast for the 
approach to be effective in practice. Other algorithms, such as MC-POMDP II2TI 
and Perseus [15|, use particle filters to represent beliefs. Perseus still uses a linear 
combination of Gaussians for value function representation and thus suffers some 
of the same shortcomings mentioned above. MC-POMDP represents a value func- 
tion by storing its values at the sampled belief points and interpolating over them 
using Gaussians as kernel functions and KL divergence as the distance function. 
Interpolation in a belief space is not easy. KL divergence does not satisfy the met- 
ric properties, making it difficult to understand the interpolation error. Furthermore, 
choosing suitable parameter values for the Gaussian kernels involves some of the 
same difficulties as those in choosing an a priori discretization of the state space. 

MCVI also uses the particle-based belief representation, but it exploits one key 
successful idea of point-based discrete POMDP algorithms: the a-vectors. It cap- 
tures the a-functions implicitly as a policy graph ISl fTll and retains their main bene- 
fits by paying a computational cost. To construct the policy graph, MCVI makes use 
of approximate dynamic programming by sampling the state space and performing 
Monte Carlo (MC) simulations. Approximate dynamic programming has also been 
used in policy search for Markov decision processes (MDPs) and POMDPs without 
exploiting the benefits of a-functions [Ij . 
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MCVI takes the approach of offline policy computation. An alternative is to per- 
form online search fTT', T]. These two approaches are complementary and can be 
combined to deal with challenging planning tasks with long time horizons. 



3 Monte Carlo Value Iteration 

In this paper, we focus on the main issue of continuous state spaces and make the 
simplifying assumption of discrete action and observation spaces. 

3.1 Policy Graphs 

One way of representing a policy is a policy graph G, which is a directed graph 
with labeled nodes and edges. Each node of G is labeled with an action a G A, 
and each edge of G is labeled with an observation o E O. To execute a policy 
ttg represented this way, we use a finite state controller whose states are the nodes 
of G. The controller starts in a suitable node v of G, and a robot, with initial belief b, 
performs the associated action a« . If the robot then receives an observation o, the 
controller transitions from d to a new node v' by following the edge {v, v') with 
label o. The process then repeats. The finite state controller does not maintain the 
robot's belief explicitly, as in Q. It encodes the belief implicitly in the controller 
state based on the robot's initial belief b and the sequence of observations received. 
For each node v of G, we may define an a-function ay. Let ttc.v denote a partial 
policy represented by G, when the controller always starts in node v of G. The value 
a^ (s) is the expected total reward of executing Tr^t, with initial robot state s: 

oc oo 

a„(s) = E(^ 7*i?(st, at)) = R{s, a„) + E(^ j'R{st,at)) (4) 

t=o t=i 

Putting Q together with ([T]) and (O, we define the value of b with respect to ttq as 
Vg(6) = max / ay{s)b{s)ds. (5) 



v£G 



ses 



So Vg is completely determined by the a-functions associated with the nodes of G. 



3.2 MC-Backup 

The optimal POMDP value function V* can be computed with value iteration (VI), 
which is based on the idea of dynamic programming Q . An iteration of VI is com- 
monly called a backup. The backup operator H constructs a new value function 
Vt+i from the current value function Vt'. 

Vt+i{b) = HVtib) = max |i?(6,a) +7 Vp(o|6,a)yt (6') I , (6) 

oGO 



^\02 
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where R{b, a) = J^ g R{s, a)b{s)ds is the robot's expected immediate reward and 
b' = T{b, a, o) is the robot's next belief after it takes action a and receives observa- 
tion o. At every b € B, the backup operator H looks ahead one step and chooses the 
action that maximizes the sum of the expected immediate reward and the expected 
total reward at the next belief. Under fairly general conditions, Vt converges to the 
unique optimal value function V* . 

Representing a value function as a set of a-functions has many benefits, but stor- 
ing and computing a-functions over high-dimensional, continuous state spaces is 
difficult (Section [2.2b . We do not represent a value function explicitly as a set of 
a-functions, but instead represent it implicitly as a policy graph. Let Vo denote the 
value function for the current policy graph G. Substituting ^ into Q, we get 

HVaib) = Ta-&yi{ I R{s,a)b{s)ds + ^y p(o\b,a)iLRax av{s)b'{s)ds>. 

•^eA IJ.^s ^ veG J^^s J 

(7) 
Q Let us first evaluate (|7]) at a particular point b G 

,--'~s (5j /'"A'^ ^ ^^'-^ construct the resulting new policy graph 

\_^l^'^'' '^2 G' , which contains a new node u and a new edge 

from u for each o ^ O (Fig. [T]). Since we do 
' not maintain a-functions explicitly, it seems dif- 

'~)v^X ^^ ficult to compute the integral /^^^ a^ {s)b'{s)ds. 

V^^X / ^ However, the definition of «„ in ^ suggests 

computing the integral by MC simulation: re- 
peatedly sample a state s with probability 6'(s) 
Fie. 1 Backup a policy graph G. The , ., . i » »i i- r> u- e 

, , , ,. ^. f: f , and then simulate the policy nn v Pushing lur- 

dashed lines indicate the new node , , . . , ■ r ' , , 

, , ther on this idea, we can m tact evaluate the en- 

and edges. j_, 

tire right-hand side of (17|l via sampling and MC 

simulation, and construct the new policy graph G". We call this MC-backup of G 
at b (Algorithm[T])- 

Conceptually, Algorithm[T]considers all possible ways of generating G'. The new 
node u in G' has \A\ possible labels, and each outgoing edge from u has \G\ possible 
end nodes in G, where \G\ denotes the number of nodes in G (Fig. [Hi. Thus, there are 
|j4||G|I'^I candidates for G'. Each candidate graph G' defines a new policy ttc',u- 
We draw N samples to estimate the value of b with respect each candidate ttc',u- For 
each sample, we pick s from the state space S with probability b{s) . We run an MC 
simulation under TTa>,u, starting from s, for L steps and calculate the total reward 
St=o J*R{^t,cit)- The simulation length L is selected to be is sufficiently large so 
that the error due to the finite simulation steps is small after discounting. We then 
choose the candidate graph with the highest average total reward. Unfortunately, 
this naive procedure requires an exponential number of samples. 

Algorithm[T]computes the same result, but is more efficient, using only A^|yl| |G| 
samples. The loop in line 3 matches the maximization over actions a G A in ([7]). 
The loop in line 4 matches the first integral over states s G S and the sum over 
observations o G O. The loop in line 8 matches the maximization over nodes v G G. 
The three nested loops generate the simulation results and store them in Va,o,v for 
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Algorithm 1 . Backup a policy graph G at a belief b with N samples. 
MC-Backup(G, &, AT) 

1. For each action a £ A, Ra ^ 0. 

2. For each action a £ A, eacli observation o £ O, and each node v £ G, Va,o,v ^- 0. 

3. for eacli action a £ Ado 

4. for i = 1 to Af do 

5. Sample a state Si witli probability distribution b{si). 

6. Simulate taking action a in state Si. Generate the new state s^ by sampling from 
the distribution T{si,a, s'i) — p{s'i\si, a). Generate the resulting observation Oi by 
sampling from the distribution Z{s'i, a, Oi) = p{oi\s'i, a). 

7. Ra^ Ra + R{Si,a). 

8. for each node v £ G do 

9. Set V' to be the expected total reward of simulating the policy represented by G, 
with initial controller state v and initial robot state s^. 

10. Va,o„v^Va,o„v + V'. 

1 1 . for each observation o £ O do 

12. Va,o ^ aiaxyeGVa,o,v, Ua, o ^ argmax„gQ Va,o,„. 

13. K^(i?a+7Eo6o/«.°)/^- 

14. V* ^ maxagA Va, a* ^ argmax^g^ Va- 

15. Create a new policy graph G' by adding a new node u to G. Label u with a* . For each 
o £ O, add the edge {u, Va* ,o) and label it with o. 

16. return G'. 



a G A,o G 0,andv € G. Using Va.o,v, one can compare the values at b with 
respect to any candidate policy graphs and choose the best one (lines 1 1-14). 

Interestingly, a relatively small number of samples are sufficient for MC-backup 
to be effective. Let HbVc denote the value function for the improved policy graph 
resulting from MC-backup of G at b. With high probability, HbVa approximates 
HVo well at b, with error decreasing at the rate 0{1/^/N). For simplicity, we as- 
sume in our analysis below that the simulation length L is infinite. Taking the finite 
simulation length into account adds another error term that decreases exponentially 
with L. 

Theorem 1. Let i?max be an upper bound on the magnitude of R{s, a) over s £ S 
and a £ A. Given a policy graph G and a point b £ B, MC-BaCKUP(G, b, N) 
produces an improved policy graph such that for any r £ (0, 1), 



I UM fk^ u M ^Ml <- 2^-- . 2(|Q|ln|G|+ln(2|A|) + ln(l/r)) 

with probability at least 1 — t. 

Proof. There are |A||G|I'^I candidates for the improved policy graph. Effectively 
MC-Backup uses N samples to estimate the value at b with respect to each candi- 
date and chooses the best one. 

First, we calculate the probability that all the estimates have small errors. Let 
CTi be a random variable representing the total reward of the ith simulation under a 
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candidate policy. Define a = X]i=i '^i/^ ■ Using Hoeffding's inequality, we have 
p{\(7 - E(cr)| > e) < 2e-^^'/2C^', where C = i?max/(l - 7) and e is a small 
positive constant. Let Ei denote the event that the estimate for the ith candidate 
policy has error greater than e. Applying the union bound p(ljj Ei) < '^^piEi), 
we conclude that the estimate for any of the candidate policy graphs has error 
greater than e with probability at most r — 2|j4||G'|l*^le~^'^ Z^*^ . So we set 

^ ^ ^. / 2(rO|ln|G|+ln(2|A|)+ln(l/T 



N 

Next, let G* denote the best candidate policy graph and G^^ denote the candidate 
graph chosen by MC-Backup. Let a* and a^^ be the corresponding estimates of 
the value at b in MC-Backup. Then, 

HVaib) - HbVaib) = E((T*) - EKJ 

-E(a*)-a*+a*-EKJ 
<E(a*)-a*+<,-EKJ, 

The inequality in the last line follows, as MC-BACKUP always chooses the candidate 
policy graph with the highest estimate. Thus <t* < <t^^. Finally, the result in the 
previous paragraph implies that \a* — E(cr*)| < e and \a^^ — E(ct*(,)| < e, with 
probability at least 1 — r. Hence, \HVc{b) — Hi,Vc{b)\ < 2e, and the conclusion 
follows. D 

Now we combine MC-backup, which samples the state space S, and point-based 
POMDP planning, which samples the belief space B. Point-based POMDP algo- 
rithms use a set B of points sampled from B as an approximate representation of 
B. Let 6b — supjggminb'gB \\b — b'\\i be the maximum Li distance from any 
point in B to the closest point in B. We say that B covers B well if 6b is small. 
Suppose that we are given such a set B. In contrast to the standard VI backup 
operator H, which performs backup at every point in B, the operator Hb applies 
MC-Backup(G, b, N) on a policy graph G at every point in B. Each invocation 
of MC-Backup (G, 6, N) returns a policy graph with one additional node added to 
G. We take a union of the policy graphs from all the invocations over b E B and 
construct a new policy graph G'. Let Vq be value function for some initial policy 
graph and Vt+i = HbVi. 

The theorem below bounds the approximation error between Vt and the optimal 
value function V* . 

Theorem 2. For every b E B and every r G (0,1), 



,,.,,,, ,,,,„^ 2J?^ax 2((|Q| + l)ln(|i3|t) + ln(2|A|) + ln(l/T)) 
\V (b) V,{b)\ < (T^^ V N 

:6p. 



(1-7)2 (I-7) ' 

with probability at least 1 — t. 
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To keep the proof simple, the bound is not tight. The objective here is to identify the 
main sources of approximation error and quantify their effects. The bound consists 
of three terms. The first term depends on how well MC-backup samples S (Algo- 
rithmlU line 5). It decays at the rate 0{l/\/N). We can reduce this error by taking 
a suitably large number of samples from S. The second term, which contains 5b, 
depends on how well B covers B. We can reduce Sb by sampling a sufficiently large 
set B to cover B well. The last term arises from a finite number t of MC-backup iter- 
ations and decays exponentially with t. Note that although MC-backup is performed 
over points in B, the error bound holds for every b ^ B. 

To prove the theorem, we need a Lipschitz condition on value functions: 

Lemma 1. Suppose that a POMDP value function V can be represented as or 
approximated arbitrarily closely by a set of a-functions. For any h, b' £ B, if 

\\b-b'\\i<5,then\V{b)-Vib')\ < ^^S. 

We omit the proof of Lemma[Tl as it is similar to an earlier proof 191 for the special 
case V — V*. We are now ready to prove Theorem|2l 

Proof (Theorem^. Let Cf — max^gB |^*(&) ~ Vt{b)\ be the maximum error of 
Vt{b) over b G B. First, we bound the maximum error of Vt{b) over any b E B in 
terms of et- For any point b £ B, let b' be the closest point in B to b. Then 

\V*{b) - Vt{b)\ < \V*{b) - V*{b')\ + \V*{b') - Vt{b')\ + \Vt{b') - Vt{b)\ 

Applying Lemma[T]twice to V* and Vt, respectively, and using |l/*(fe') — Vt{b')\ < 
et,weget 

\V*{b)-Vt{b)\<^^SB+et. (8) 

Next, we bound the error et- For any b' E B, 

\V*{b') - Vt{b')\ < \HV*{b') ~ Ht>Vt-i{b')\ 

< \HV*{b') - HVt^i{b')\ + \HVt-i{b') - Hi,.Vt-i{b% (9) 

The inequality in the first line holds, because by definition, V*{b') — HV*{b'), 
V*{b') > Vt{b'), and Vt{b') > Hb,Vt-i{b'). It is well known that the operator H is 
a contraction: \\HV — HV'\\oc < 7||^ — ^'lloo for any value functions V and V', 
where || • \\oo denotes the Loo norm. The contraction property and ^ imply 

\HV*{b') - HVt^i{b')\ < j^^^Sb + et-i). (10) 

Theorem [T] guarantees small approximation error with high probability for a sin- 
gle MC-backup operation. To obtain Vt, we apply H^ for t times and thus have 
\B\t MC-backup operations in total. Suppose that each MC-backup fails to achieve 
small error with probability at most T/\B\t. Applying the union bound together with 
TheoremlU every backup operation Hb' achieves 
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|HV,_,(6')-H.,V,_,(t')l < ;g^.Ml°l-(|B|t) + ln(2|A|) + l„(|B|,/r)) 

1 — 7 V jV 

with probability at least 1 — r. We then substitute the inequalities (IPUTTT) into the 
definition of e^ and derive a recurrence relation for et ■ For any initial policy graph, 
the error eg can be bounded by 2i?niax/(l ~ 7)- Solving the recurrence relation for 
et and substituting it into ^ gives us the final result. D 



3.3 Algorithm 

Theorem |2] suggests that by performing MC-backup over a set B of suitably sam- 
pled beliefs, we can approximate the optimal value function with a bounded error. 
To complete the algorithm, we need to resolve a few remaining issues. First, we 
need a method for sampling from the belief space and obtaining B. Next, Hg per- 
forms backup at every point in B, but for computational efficiency, we want to 
perform backup only at beliefs that lead to significant improvement in the value 
function approximation. Both issues occur in discrete POMDP algorithms as well 
and have been addressed in earlier work. Finally, we use particle filters |22J to repre- 
sent beliefs over continuous state spaces. Particle filtering can be implemented very 
efficiently and has been used with great success in important robotic tasks, such as 
localization and SLAM [221 ■ 

We now give a short description of the algorithm. It shares the same basic struc- 
ture with our SARSOP algorithm 1 12| for discrete POMDPs; however, it uses MC- 
backup and particle filtering to handle continuous state spaces. 

Overview. The algorithm computes an approximation to an optimal policy by up- 
dating a policy graph G. To improve G, it samples beliefs incrementally and per- 
forms backup at selected sampled beliefs. 

Let TZ C B he a subset of points reachable from a given initial belief bo E B 
under arbitrary sequences of actions and observations. Following the recent point- 
based POMDP planning approach, our algorithm samples a set of beliefs from this 
reachable space TZ rather than B for computational efficiency, as TZ is often much 
smaller than B. The sampled beliefs form a tree T-i^. Each node of T-,^ represents a 
sampled belief b € TZ, and the root of T,j is the initial belief bo. If b is a node of 
Ttc and b' is a child of b in T^, then b' = rib, a, o) for some a E A and o G O. By 
definition, the belief associated with every node in T^ lies in TZ. 

To sample new beliefs, our algorithm updates r,^ by performing a search in TZ. 
At each node 6 of T^, it maintains both upper and lower bounds on V*{b). We start 
from the root of T^ and traverse a single path down until reaching a leaf of T^. At 
a node b along the path, we choose action a that leads to the child node with the 
highest upper bound and choose observation o that leads to the child node making 
the largest contribution to the gap between the upper and lower bounds at the root of 
Ttj. These heuristics are designed to bias sampling towards regions that likely lead 
to improvement in value function approximation. If fe is a leaf node, then we use the 
same criteria to choose a belief b' among all beliefs reachable from b with an action 
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a G A and an observation o E O. We compute b' — T{b, a, o) using particle filtering 
and create a new node for 6' in T-,^ as a child of b. The sampling path terminates 
when it reaches a sufficient depth to improve the bounds at the root of Tt^ . We then 
go back up this path to the root and perform backup at each node along the way to 
update the policy graph as well as to improve the upper and lower bound estimates. 
We repeat the sampling and backup procedures until the gap between the upper and 
lower bounds at the root of T^ is smaller than a pre-specified value. 

Policy and lower bound backup. The lower bound at a tree node b is computed 
from the policy graph G. As G always represents a valid policy, Vc{b) is a lower 
bound of V*{b). We initialize G with a simple default policy, e.g., always perform- 
ing a single fixed action. To update the lower bound at 6, we perform MC-backup 
on G at b. As a result, we obtain an updated policy graph G" and an MC estimate of 
the value at b with respect to G' as an improved lower bound. 

Upper bound backup. To obtain the initial upper bound at a node b, one general 
approach is to apply standard relaxation techniques. Assuming that a robot's state 
variables are all fully observable, we can solve a corresponding MDP, whose value 
function provides an upper bound on the POMDP value function. By assuming that 
a robot's actions are deterministic, we can further relax to a deterministic planning 
problem. To update the upper bound at b, we use the standard backup operator. 

The upper and lower bounds in our algorithm are obtained via sampling and MC 
simulations, and are thus approximate. The approximation errors decrease with the 
number of samples and simulations. Since the bounds are only used to guide belief 
space sampling, the approximation errors do not pose serious difficulties. 

For lack of space, our algorithm description is quite brief. Some additional details 
that improve computational efficiency are available in 1 12|, but they are independent 
of the use of MC-backup and particle filtering to deal with continuous state spaces. 



4 Experiments 

We implemented MCVI in C-n- and evaluated it on three distinct robot motion plan- 
ning tasks: navigation, grasping, and exploration. In each test, we used MCVI to 
compute a policy. We estimated the expected total reward of a policy by running a 
sufficiently large number of simulations and averaging the total rewards, and used 
the estimate as a measure of the quality of the computed policy. As MCVI is a ran- 
domized algorithm, we repeated each test 10 times and recorded the average results. 
All the computation was performed on a computer with a 2.66 GHz Intel processor 
under the Linux operating system. 



4.1 Navigation 

This 1-D navigation problem first appeared in the work on Perseus fTSi, an earlier 
algorithm for continuous POMDPs. A robot travels along a corridor with four doors 
(Fig. [2|5[). The robot's goal is to enter the third door from the left. The robot has 
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Fig. 2 Navigation in a corridor, (a) Tlie environment, (b) Tlie observation function for the 
observation CORRIDOR, (c) Estimated expected total rewards of computed policies. 



three actions: MOVE-LEFT, MOVE-RIGHT, and ENTER. The robot does not know 
its exact location, but can gather information from four observations: LEFT-END, 
RIGHT-END, DOOR, and CORRIDOR, which indicate different locations along the 
corridor. Both the actions and observations are noisy. The robot receives a positive 
reward if it enters the correct door, and a negative reward otherwise. 

For comparison with Perseus, we use the same model as that in ifTSl . Perseus re- 
quires that all the transition functions, observation functions, and reward functions 
are modeled as a combination of Gaussians. See Fig. [JJ? for an illustration of the 
observation function for the observation CORRIDOR. Details of the model are avail- 
able in fT5\ . It is important to note that representing the entire model with Gaussians 
imposes a severe restriction. Doing so for more complex tasks, such as grasping and 
obstructed rock sample in the following subsections, is impractical. 

We ran MCVI with 600 particles for belief representation and N = 400 for 
MC-Backup. We also ran Perseus using the original authors' Matlab program, 
with parameter settings suggested in ifTSl . There are two versions of Perseus using 
different belief representations. One version uses Gaussian mixture, and the other 
one uses particle filtering. The results are plotted in Fig. [J];. The horizontal axis 
indicates the time required for policy computation. The vertical axis indicates the 
average total reward of a computed policy. Each data point is the average over 10 
runs of each algorithm. The error bars indicate the 95% confidence intervals. 

Since MCVI is implemented in C-n- and Perseus is implemented in Matlab, the 
running times are not directly comparable. However, the plot indicates that MCVI 
reaches the same performance level as Perseus, even though MCVI does not require 
a Gaussian model and does not take advantage of it. Also, the smaller error bars for 
MCVI indicate that it is more robust, especially when the planning time is short. 

The main purpose of this test is to compare with Perseus, a well-known earlier al- 
gorithm for continuous POMDPs. As one would expect, the task is relatively simple. 
We can construct a discrete POMDP model for it and compute a policy efficiently 
using discrete POMDP algorithms. 



Monte Carlo Value Iteration for Continuous-State POMDPs 187 

4.2 Grasping 

In this simplified grasping problem IS), a robot hand with two fingers tries to grasp a 
rectangular block on a table and lift it up (Fig. [3]). The fingers have contact sensors at 
the tip and on each side. Thus, each observation consists of outputs from all the con- 
tact sensors. The observations are noisy. Each contact sensor has a 20% probability 
of failing to detect contact, when there is contact, but 0% probability of mistakenly 
detecting contact, when there is none. Initially, the robot is positioned randomly 
above the block. Its movement is restricted to a 2-D vertical plane containing both 
the hand and the block. The robot's actions include four compliant guarded moves: 
MOVE-LEFT, MOVE-RIGHT, MOVE-UP, and MOVE-DOWN. Each action moves the 
robot hand until a contact change is detected. The robot also has OPEN and CLOSE 
actions to open and close the fingers as well as a LIFT action to lift up the block. If 
the robot performs LIFT with the block correctly grasped, it is considered a success, 
and the robot receives a positive reward. Otherwise, the robot receives a negative 
reward. In this problem, uncertainty comes from the unknown initial position of the 
robot hand and noisy observations. 

We ran MCVI with 150 particles for belief representation and N — 500 for 
MC-Backup. On the average, the planning time is 160 seconds, and the computed 
policy has a success rate of 99.7%. For comparison, we manually constructed a 
open-loop policy: MOVE-LEFT -^ move-down -^ move-right -^ move-up -+ 
MOVE-RIGHT -* MOVE-DOWN -^ CLOSE -^ LIFT. The success rate of this poHcy 
is only 77.2%. Many of the failures occur because the manually constructed policy 
does not adequately reason about noisy observations. 

Fig.[3]shows a simulation run of the computed policy. In one MOVE-LEFT action 
(Fig. [Sf'), the tip contact sensor of the left finger fails to detect the top surface of 
the block. At a result, the robot does not end the MOVE-LEFT action in the proper 
position, but it recovers from the failure when the tip contact sensor of the right 
finger correctly detects contact (Fig.ISji)- 

The grasping problem can also be modeled as a discrete POMDP [8|. However, 
this requires considerable efforts in analyzing the transition, observation, and reward 
functions. Although the resulting discrete POMDP model is typically more compact 
than the corresponding continuous POMDP model, the discretization process may 
be difficult to carry out, especially in complex geometric environments. In contrast, 
MCVI operates on continuous state spaces directly and is much easier to use. 

4.3 Obstructed Rock Sample 

The original Rock Sample problem lfT9l is a benchmark for new discrete POMDP 
algorithms. In this problem, a planetary rover explores an area and searches for rocks 
with scientific value. The rover always knows its own position exactly, as well as 
those of the rocks. However, it does not know which rocks are valuable. It uses the 
SENSE action to take noisy long-range sensor readings on the rocks. The accuracy 
of the readings depends on the distance between the rover and the rocks. The rover 
can also apply the SAMPLE action on a rock in the immediate vicinity and receive 
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Fig. 3 A simulation run of the simplified grasping task. The spheres at the tip and the sides of 
the fingers indicate contact sensors. They turn white when contact is detected. 



a positive or negative reward, depending on whether the sampled rock is actually 
valuable. The robot's goal is to find as many valuable rocks as possible quickly and 
then move to the right boundary of the environment to report the results. 

We extended Rock Sample in several ways to make it more realistic. We intro- 
duced obstructed regions, which the rover cannot travel through. Furthermore, the 
rover's movement is now noisy. In each time step, the rover can choose to move 
in any of eight equally spaced directions with two speed settings. Finally, the rover 
does not always know its own location exactly. It can only localize in the imme- 
diate vicinity of a rock, which serves as a landmark. We call this extended version 
Obstructed Rock Sample (ORS). 

We created three models of ORS by varying the noise levels for the rover's move- 
ments and long-range rock sensor. We ran MCVI on each model. The average plan- 
ning time ranges from 5 minutes for the low-noise model to a maximum of 2 hours. 

Fig.HJshows a simulation run for each computed policy. For the low-noise model 
(Fig. |4]a!), the rover first moves towards the top-left rock. It senses the rock and 
decides to sample it. It also senses the lower-left rock, but cannot determine whether 
the rock is valuable, because the rock is far away and the sensor reading is too noisy. 
The rover then approaches the lower-left rock and senses it again. Together the two 
sensor readings indicate that the rock is likely bad. So the rover does not sample 
it. Along the way, the rover also senses the top-right rock twice and decides that 
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Fig. 4 Simulations runs for three ORS models: (a) low noise in sensing and movements, 
(b) higher sensor noise, and (c) higher movement noise. Shaded polygons indicate obstructed 
regions. Shaded and white discs indicate the regions in which the rover may perform the 
SAMPLE action. The rocks are located at the center of the discs. Shaded discs represent valu- 
able rocks, and white discs represent bad rocks. Solid black curves indicates the rover's tra- 
jectories. Each "o" marks a location where the rover performs a SAMPLE action. Each "A" 
marks a location where the rover performs a SENSE action, and the corresponding dashed line 
indicates the rock being sensed. 



the rock is likely valuable. As the movement noise is low, the rover chooses to go 
through the narrow space between two obstacles to reach the rock and sample it. 
It then takes a shortest path to the right boundary. We do not have a good way of 
determining how well the computed policy approximates an optimal one. In this 
simulation run, the jaggedness in the rover's path indicates some amount of sub- 
optimality. However, the rover's overall behavior is reasonable. When the sensor 
noise in the model is increased (Fig. |4fi), the rover maintains roughly the same 
behavior, but it must perform many more sensing actions to determine whether a 
rock is valuable. When the movement noise is increased (Fig-H];), the rover decides 
that it is too risky to pass through the narrow space between obstacles and takes an 
alternative safer path. 

A standard discrete POMDP model of Rock Sample uses a grid map of the en- 
vironment. Typically discrete POMDP algorithms can handle a 10 x 10 grid in rea- 
sonable time. This is inadequate for complex geometric environments. The envi- 
ronment shown in Fig. IH which consists of relatively simple geometry, requires a 
grid of roughly 50 x 50, due to closely spaced obstacles. A discrete POMDP model 
of this size requires about 4 GB of memory before any computation is performed. 
MCVI avoids this difficulty completely. 

4.4 Discussion 



While the experimental results are preliminary, the three different examples indi- 
cate that MCVI is flexible and relatively easy to use. It does not require artificial 
discretization of a continuous state space as a grid. It also does not impose restric- 
tion on the parametric form of the model. 
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Our current implementation of MCVI uses fixed values for the number of par- 
ticles, M, for belief representation and the parameter N in MC-Backup. Our ex- 
perimental results show that MC-Backup typically takes around 99% of the total 
running time and is the dominating factor. To improve efficiency, we may use the 
sample variance of the simulations to set N adaptively and stop the simulations as 
soon as the variance becomes sufficiently small. We may over-estimate M, as this 
does not affect the total running time significantly. 



5 Conclusion 

POMDPs have been successfully applied to various robot motion planning tasks 
under uncertainty. However, most existing POMDP algorithms assume a discrete 
state space, while the natural state space of a robot is often continuous. This paper 
presents Monte Carlo Value Iteration for continuous-state POMDPs. MCVI sam- 
ples both a robot's state space and the corresponding belief space, and computes a 
POMDP policy represented as a finite state controller. The use of Monte Carlo sam- 
pling enables MCVI to avoid the difficulty of artificially discretizing a continuous 
state space and make it much easier to model robot motion planning tasks under 
uncertainty using POMDPs. Both theoretical and experimental results indicate that 
MCVI is a promising new approach for robot motion planning under uncertainty. 

We are currently exploring several issues to improve MCVI. First, the running 
time of MCVI is dominated by MC simulations in MC-backup. We may group sim- 
ilar states together and avoid repeated MC simulations from similar states. We may 
also parallelize the simulations. Parallelization is easy here, because all the simu- 
lations are independent. Second, the size of a policy graph in MCVI grows over 
time. We plan to prune the policy graph to make it more compact |6|. Finally, an 
important issue is to deal with not only continuous state spaces, but also continuous 
observation and action spaces. 
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Randomized Belief- Space Replanning 
in Partially-Observable Continuous 
Spaces 



Kris Hauser 



Abstract. We present a sample-based replanning strategy for driving 
partially-observable, high-dimensional robotic systems to a desired goal. At 
each time step, it uses forward simulation of randomly-sampled open-loop 
controls to construct a belief-space search tree rooted at its current belief 
state. Then, it executes the action at the root that leads to the best node in 
the tree. As a node quality metric we use Monte Carlo simulation to estimate 
the likelihood of success under the QMDP belief-space feedback policy, which 
encourages the robot to take information-gathering actions as needed to reach 
the goal. The technique is demonstrated on target-finding and localization 
examples in up to 5D state spacess. 



1 Introduction 

Many robotics problems involve planning in uncertain, partially-observable 
domains, which requires reasoning about how hypothetical state distribu- 
tions, belief states, change over time as the robot acts upon the world and 
gathers information with its sensors. Although this has been studied heavily 
in discrete domains, most realistic robotics problems have continuous, high- 
dimensional state spaces with nonlinear dynamics, which places them far 
out of the reach of tractability for state-of-the-art planners built for discrete 
systems. Although recent techniques have made progress in addressing con- 
tinuous systems assuming Gaussian process and observation noise [3|, ll9|, |21| , 



the more general case of nonlinear and multi-modal belief states have proven 
to be much more challenging, in large part because of the difficulty of repre- 



senting policies over an infinite-dimensional belief space 20L |22|. 
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Fig. 1 An execution trace of a robot (large circle) searching for a wandering target 
(pink circle) in the unit square. The robot's sensor has a 0.25 unit range. The current 
belief state is represented by 100 particles (dots) and the current plan (orange) is 
updated by replanning. 



We present a Randomized Belief-Space Replanning (RBSR) technique that 
addresses an undiscounted and cost-free version of the continuous partially- 
observable Markov decision process (POMDP) formulation, where the belief 
state must be driven to a goal region. Rather than solving the POMDP 
once and using the solution as a lookup table, RBSR repeatedly generates 
coarse plans, executes the first step, and uses sensor feedback to refine future 
plans (Figure [T]) . Much like a receding-horizon controller or model predictive 
controller, its success rate and computation time depend on the exploration 
strategy used to generate the search trees, and the evaluation function used 
to pick the "best" plan. 

The RBSR exploration strategy performs a forward search in belief space 
by randomly sampling open-loop actions, and for an evaluation function we 
estimate the success rate of a QMDP-like policy. QMDP is a heuristic strategy 
that descends the cost-to-go function of the underlying MDP, averaged over 
the belief state [16| , and it works well when state uncertainty is low, but with 
high uncertainty it may fall into local minima because it fails to perform 
active information-gathering. Hence, RBSR's random exploration strategy 
discourages information loss and encourages information-gathering actions 
because they improve the likelihood that QMDP succeeds. 

RBSR employs random sampling approaches at multiple points in the 
procedure — random belief-space exploration strategies, particle filtering for 
state estimation, probabilistic roadmap-like approaches in the QMDP policy 
evaluation, and Monte-Carlo simulation in the evaluation function — making 
it highly parallelizable and applicable to high-dimensional spaces. In prelim- 
inary experiments, we applied RBSR to a simulated target pursuit problem 
with a 4-D state space and localization problems in up to 5-D (Figure [1]). In 
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our tests, RBSR computes each replanning step in seconds, and drives the 
belief state to a solution with high probability. Though our current imple- 
mentation is promising, it is not as reliable in 6D or higher because it becomes 
much more difficult to maintain accurate belief estimates over time. Never- 
theless, we anticipate that future implementations of RBSR will be capable 
of solving many real- world robotics problems. 

2 Related Work 

Optimal planning in partially-observable problems is extremely computation- 
ally complex and is generally considered intractable even for small discrete 
state spaces |l7| . Approximate planning in discrete spaces is a field of active 
research, yielding several techniques based on the point-based algorithms de- 
vised by Kearns et al [I4I and Pineau et al (2003) 18]. For example, the 
SARSOP algorithm developed by Kurniawati et al (2008) has solved prob- 
lems with thousands of discrete states in seconds [1J| . 

Hypothetically, these algorithms can be applied to continuous problems by 
discretizing the space. But because of the "curse of dimensionality" , any regu- 
lar discretization of a high-dimensional space will requires an intractably large 
number of states. Porta et al (2006) has made progress in extending point- 
based value iteration to the continuous setting by representing belief states as 
particles or mixtures of Gaussians [20|. Thrun (2000) presented a technique 
that also works with continuous spaces by combining particle filtering with re- 
inforcement learning on belief states [2^ . For both of these methods, the need 
to approximate the value function over the infinite-dimensional belief space 
(either using alpha-vector or Q-value representations, respectively) comes at 
a high computational and memory expense. We use similar representations, 
but because we use replanning to avoid explicit policy representation, our 
approach sacrifices near-optimality for reduction in computational expense. 

Several recently developed algorithms attempt to address continuous spaces 
by leveraging the success of probabilistic roadmaps (PRMs) in motion plan- 



ning [l]| , which build a network of states sampled at random from the config- 
uration space. Alterovitz et al (2007) present a Stochastic Motion Roadmap 
planner for continuous spaces with motion uncertainty, which solves an MDP 
using the discretization of state space induced by a PRM [ij] . The techniques 
of Burns and Brock (2007) and Guibas et al (2008) augment roadmaps with 
edge costs for motions that have high probability of being in collision, and 
respectively address the problems of localization errors and environment sens- 
ing errors [J, LZf. Huang and Gupta (2009) address planning for manipulators 
under base uncertainty by associating probabilistic roadmaps with particles 
representing state hypotheses and searching for a short path that is likely to 
be collision free [10|. 

Another set of related approaches use assumptions of Gaussian observation 
and process noise, which makes planning much faster because probabilistic 
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Fig. 2 Illustrating the replanning steps, (a) A belief-space search tree is initialized 
with the current belief state, (b) The tree is grown using random exploration of 
open-loop motions, (c) Nodes in the tree are scored with estimates of the hkelihood 
of success under the QMDP policy. Traces of 5 belief-state particles under QMDP 
simulation are depicted, (d) The best plan is executed. (If the best node is the root, 
QMDP is executed by default). 



inference can be performed in closed form. The Belief Roadmap technique 
of Prentice and Roy (2009) computes a roadmap of belief states under both 
motion and sensing uncertainty, under the assumptions of Gaussian uncer- 
tainty and linear transition and observation functions 2l| . van den Berg et al 
(2010) consider path planning while optimizing the likelihood that a path is 
collision-free, under the assumption that a Linear-Quadratic-Gaussian feed- 
back controller is used to follow the path. Piatt et al (2010) and du Toit and 
Burdick (2010) construct plans using a maximum-likelihood observation as- 
sumption, and correcting for observation errors by replanning 6_, 19] . RBSR 
also uses a replanning strategy, but uses a particle-based uncertainty repre- 
sentation that is better at handling nonlinear and multi-modal distributions, 
and makes no assumptions on the type of observations received. 

The Randomized Path Planner (RPP) was an early approach in path plan- 
ning in high-dimensional spaces that uses the principle that reactive policies 
often work well when the system is near the goal or when a space is rela- 
tively free of obstacles [2] ■ RPP plans by alternating steps of potential field 
descent and random walks to escape local minima, and was surprisingly ef- 
fective at solving path planning problems that were previously considered 
intractable. RBSR shares a similar philosophy, but addresses problems with 
partial observablility. 



3 Problem Definition 



RBSR is given a POMDP-like model of the problem as input, and it inter- 
leaves planning and execution steps much like a receding-horizon controller. 
Each iteration performs the following steps: 
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1. The current sensor input is observed, and the robot's behef state is up- 
dated using a particle filter. 

2. The planner generates a truncated search tree rooted at the current belief 
state. (Figure [2la-b) 

3. The robot executes the action associated with the "best" branch out of 
the root node. (Figure Oc-d) 

This section describes the POMDP formulation, particle filtering belief state 
update, and the QMDP policy that is used to evaluate the quality of nodes 
in the tree. 



3.1 POMDP Modeling 

The problem is formalized as an undiscounted partially-observable Markov 
decision process (POMDP) over a set of states S, actions A, and observations 
O. S, A, and O are treated as subsets of Cartesian space, although this is not 
strictly necessary. A belief state is defined to be a probability distribution over 
S. We address the setting where the robot starts at an initial belief state binit 
and wishes to reach a goal set Q C S with high probability. We treat obstacles 
by moving all colliding states to a special absorbing state. At discrete time 
steps the robot performs an action, which changes its (unobserved) state, and 
it receives an observation. 

Although most POMDP formulations are concerned with optimizing 
rewards and action costs, we treat a somewhat simpler problem of simply 
maximizing the probability of reaching the goal. We also do not consider 
discounting. Discounting is numerically convenient and has a natural inter- 
pretation in economics, but in many respects is inappropriate for robotics 
problems because it gives preference to short term rewards. 

The dynamics of the system are specified in the transition model T : s,a —> 
s' that generates a new state, given an existing state and an action. The 
sensor model is specified in the sensor model O : s —> a that generates an 
observation given a state. These models are stochastic, and we let the notation 
s' <— T(s, a) and a <— 0{s) denote sampling at random from the posterior 
distributions of T and O, respectively. 



3.2 Simulation and Filtering with Belief Particles 

To approximate the distribution over state hypotheses, we represent a belief 
state 6 as a weighted set of n particles {{w''^\ s^^^), . . . , {■w^"\ s^"-*)}, where n 
is a parameter, and the weights sum to 1. Using such a representation, we can 
easily simulate an observation o ^ 0(s'^'^^) by sampling particle (ii;'^*^\ s''^^) 
proportional to its weight. We also define functions that compute the succes- 
sor belief state after executing action a: 
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Fig. 3 The QMDP policy will succeed for well-localized belief states (a,b) , but it 
may fall into local minima for a poorly localized belief state (c). On the other hand, 
QMDP allows the robot to incorporate new information during execution. So, if it 
could sense the corner of the obstacle as a landmark, then QMDP will also reach 
the goal (d). 



r(&,a) = {(«;W,T(.W,a))}r=i 
And the posterior belief state after observing o: 

Filter(&, o) = {(-wWPr(0(s(')) = o), s^'^)}^^! 



(1) 



(2) 



where Z is a normalization factor that ensures weights sum to 1. 

We assume the robot performs state estimation using a particle filter, 
which have many variants that are beyond the scope of this work. We re- 
fer the reader to the survey in [5|] for details. Most of these techniques ad- 
dress the problem of sample impoverishment that arises when few particles in 
b are consistent with a given sequence of observations. For the remainder of 
this paper, we will assume that the chosen filter is robust enough to maintain 
sufficiently representative belief states. 



3.3 QMDP Policy 

As an endgame strategy, RBSR uses the incomplete QMDP policy that is 
quite successful in practice for highly-localized belief states or when informa- 
tion can be gathered quickly to localize the state (see Figure [Sj . QMDP is 
also used in RBSR to define a function f{b) that measure the quality of hy- 
pothetical belief states by simulating how well QMDP makes progress toward 
the goal. 

The QMDP policy essentially takes the optimal action assuming full ob- 
servability is attained on the next step [16] . Suppose we are given a complete 
cost-to-go function Cfo the fully-observable version of the problem. We will 
put aside the question of how to compute such a function until Section 13.41 
and currently we describe how to use Cfo to derive a QMDP controller for 
the partially-observable belief space. 
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The belief-space policy 7rQMDp(&) is defined to descend the expected value 
of Cfo over the distribution of states in b. More precisely, we define 

n 

C{b) EE E,^b[Cfo{s)] « J2 ^^'^Qo(s^''), (3) 

i=l 

and define ttqmdp to pick the action that descends C(b) as quickly as possible: 
7rQMDp(fe) =argmin C(T(&, a)). (4) 

a 

If the expected value of the resulting belief state does not reduce C(&), we 
define ttqmdp (&) to return "terminate". CoUision states are assigned infinite 
potential. In practice, we compute the argmin in ^ by sampling many ac- 
tions and picking the one that minimizes the RHS. 

The QMDP policy alternates state estimation and potential field descent 
using the following feedback controller: 



QMDP 

Input: belief state 6o- 

1. For t = 1,2,..., do: 

2. Sense observation ot 

3. bo ^ Filter(6i_i,ot) 

4. at = TTQMDP (&o) 

5. If at = "terminate," stop. Otherwise, execute a*. 

6. bt^T{bo,at) 

QMDP is also used in RBSR to measure quality of future belief states. We 
define a belief-state evaluation function f{b) that uses Monte-Carlo simula- 
tion of QMDP on a holdout set of m particles {s^^-*, . . . s^™'} from b which 
are used to simulate "ground truth" . The complement of the holdout set b' 
is used as the initial belief state. For each test sample s*-*', QMDP is invoked 
from the initial belief state bo — b' , and sq = s^'^ is used for simulating the 
"true" observation ©(s*^*^) (Line 2). It is also propagated forward along with 
the belief state using the transition model St <— T{st~i,at) (Line 6). This 
continues until termination. 

To enforce an ordering on f{b) (with higher values better), we incorpo- 
rate two results of the QMDP simulation: s, the fraction of terminal states 
St that lie in the goal Q, and c, the average QMDP value function evalu- 
ated at the terminal belief states C{bt). We prioritize success rate s over the 
value function c by letting /(&) return a tuple (s, — c). To compare the tuples 
/(6i) = (si,— ci) and f{b2) = (s2,~C2) we use lexicographical order; that 
is, the value function is used only break ties on success rate. (This usually 
occurs when all locally reachable belief states have zero success rate.) 
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3.4 Computing Value Functions for the 
Fully- Observable Problem 

Let us now return to the question of how one might provide a potential field 
Cfo for the fully-observable version of the input POMDP. The policy that 
descends Cfo is assumed to be complete, that is, if state is fully observable, 
then a descent of Cfo is guaranteed to reach the goal. Although in discrete 
POMDPs such a function can be computed using value iteration on the un- 
derlying MDP, the problem is more difficult in continuous POMDPs. 

In problems with no motion uncertainty, Cfo is simply a cost function 
of the underlying motion planning problem. This can sometimes be com- 
puted analytically; e.g., for a robot with unit bounds on velocity in a con- 
vex workspace, Cfo is the straight-line distance to the goal. For more com- 
plex problems with high-dimensional or complex state spaces, approximate 
methods may be needed. In our examples we use a Probabilistic Roadmap 
(PRM) [ll|] embedded in S, where each point in space is identified with its 
closest vertex in the roadmap, and the shortest distance from each vertex to 
the goal is computed using Dijkstra's algorithm. We build the PRM with a 
sufficiently large number of samples such that shortest paths in the roadmap 
approximate shortest paths in S. By caching the shortest distance for each 
PRM vertex, computing Cfo{s) runs in logarithmic time using a K-D tree to 
lookup the vertex closest to s. 

This PRM-based potential field assumes that velocities can be chosen in 
any direction and with unit cost, but can be adapted to handle differentially- 
constrained systems using other sample-based motion planners. If actions are 
stochastic, a potential based on the Stochastic Motion Roadmap [l|] might 
yield better results. We leave such extensions to future work. 



4 Randomized Belief Space Replanning 

The replanning algorithm used by RBSR grows a belief tree T whose nodes 
are belief states b e B, and the edges store open-loop actions. 



Randomized Belief Space Replanning 

Input: Current belief state 60, current plan ai, ...,«(. 

1. Initialize T with the belief states in the plan starting from 60 • 

2. For i = l,...,iV, do: 

3. Pick a node b in T and an action a. 

4. If b' = T{b, a) is feasible, add b' to T as a child of b. 

5. End 

6. Sort the nodes in T in order of decreasing EIC{b). 

7. For the M best nodes in T, evaluate f{b). 

8. Return the plan leading to the node with the highest f{b). 



Randomized Belief-Space Replanning 201 

In Line 3 we use a Voronoi exploration strategy to quickly distribute nodes 
across belief space. Lines 6-7 are used to avoid evaluating / on all nodes on 
the tree, because it is an relatively expensive operation. We use an expected 
information gain score EIG(b) to restrict the evaluations of / to a small sub- 
set of nodes M << N. Because EIG{b) is less expensive than / to compute, 
this strategy leads to major speed gains. These strategies are described in 
greater detail below. 

4-1 Voronoi- Biased Exploration Strategy 

The exploration strategy is designed to cover the space of reachable open-loop 
motions quickly, and we use a Voronoi-biasing strategy much like the Rapidly- 
Exploring Random Tree (RRT) motion planner [l5|] . To expand the tree, we 
sample a random target point stgt from the state space 5, and sample a set of 
representative particles from all belief states in the tree R — {s\b ^ T,s ^ b}. 
Then, we find the closest point s from R to stgt- We then find a control a 
action that brings s closer to stgt- 

4-2 Expected Information Gain Scoring Strategy 

We use an expected information gain strategy to avoid running expensive 
evaluations of / on belief states that are unlikely to yield improvements in 
/. The intuition is that information gain is a sort of proxy score for QMDP 
favorability because it measures the spread of a belief state distribution, and 
QMDP tends to succeed more when states are localized. We compute the 
expected information gain for a belief state b as follows. The information gain 
of the observation o is the KuUback-Leibler divergence between the posterior 
distribution bo = Pr{s\o, b) and the prior b = Pr{s\b): 

/(.J»).£^P,.(*,.),„g:^. (5) 

Given a particle representation of belief states bo and 6, we replace the dis- 
tribution Pr(s\b) using a kernel density estimator with Gaussian kernels cen- 
tered on the particles in b, and approximate the integral by the weighted sum 
over the particles s*^*^ in bo- 

The expected information gain is simply the expectation of ([5]) over o: 

EIG(b)^ f Pr{o\b)I{bo\\b) (6) 

Joeo 

To compute this, we compute the observation o''' = 0{s^^>) for each par- 
ticle in b, perform particle filtering bo = Filter{b,o^^'), and then compute 
the weighted average of ([5]) over all particles. Although EIG is an 0{n^) 
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computation, in our experiments it is typically orders of magnitude faster 
than computing /, and this scoring stage leads to major speedups. 

4-3 Complexity and Convergence 

The time complexity of RBSR depends on several parameters: the number of 
belief space particles n, the number of holdout particles for QMDP simulation 
m, the number of exploration steps N, and the number of nodes retained for 
QMDP evaluation M. Assume that an evaluation of 7rQMDp(&) © takes time 
0{n). Then, the exploration stage takes time 0{nN'^), the EIG scoring takes 
time 0{n^N), and the evaluation stage takes time 0{TnmM) where T is 
the average number of steps taken by QMDP before it terminates. But the 
running time of the evaluation stage hides a higher constant factor because it 
uses more expensive operations such as state and path collision checking, and 
in our experiments it dominates running time. Space complexity is 0{{n + 
m)N). 

The parameter n affects how accurately RBSP tracks and predicts belief 
states using the particle filter, and should be set high enough to attain a 
desired accuracy. In our experiments we do a small amount of tuning to find 
a reasonable parameter, m affects how accurately RBSP predicts the success 
rate of the QMDP policy, and f{b) may be quite noisy for low m. Specifically, 
the variance of the success rate estimate p is bounded by p(l — p)/m., and m 
should be chosen to achieve a desired accuracy. Parameters N and M affect 
the chance that RBSR makes progress toward the goal in a single time step. 
We used parameters N — 100 and M = 10 in our experiments. 



5 Experimental Results 

We performed experiments on two scenarios: a 2D pursuit scenario with a 
4D state space, as well as a localization scenario that has tunable dimen- 
sion. Although these problems are not difficult to solve using special-purpose 
strategies, they pose a challenge for general-purpose planners to solve in a 



^, 



;'^ 





Fig. 4 Execution traces of the pursuit example for four different initial target 
locations (purple circles). The robot uses a distance sensor with maximum range 
0.25. 
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reasonable amount of time and memory. For example, the SARSOP plan- 
ner [lj| can approximately solve a coarsely discretized version of the pursuit 



scenario in a few minutes, but it exhausts our test machine's 2Gb of memory 
once the resolution of the workspace grid exceeds 15x15. 



5.1 Pursuit Scenario 

Our first set of experiments consider a pursuit scenario in the unit square 
where the robot must reach a slower target that moves at random (Figure [T|). 
The position of the robot is observable and controlled precisely, but it cannot 
sense the target outside a circle of radius 0.25. The target's position is a 
uniform distribution in the initial belief state, and the goal condition is to 
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Fig. 5 An execution trace of a robot localizing itself to reach the red circle with 
high probability. Its sensor measures the distance to the walls, and has maximum 
range 0.05 (dashed lines). The current belief state is represented by 100 particles 
(dots) with a covariance ellipsoid, and the current plan (orange) is updated by 
replanning. 
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Fig. 6 A robot localizing itself using a proximity sensor in a space with obstacles. 



achieve a distance of 0.05 to the target. We tested three sensor models: 1) a 
position sensor that reports the target's x, y position relative to the robot, 
2) a direction sensor that reports only direction and not distance, and 3) a 
distance sensor that does not report direction. 

Using preliminary experiments we tuned the number of particles in the be- 
lief state needed for accurate particle filtering, and found that 100 particles 
were sufficient for the position and direction sensor, and 200 particles were 
needed for the proximity sensor. So, we used m = 50 particles as a holdout 
set, and n — 150 and n — 250, respectively, for the position/direction sen- 
sors and the proximity sensor. In 25 trials on each of these problems, with 
random target start states, RBSP never failed to reach the target. Several 
execution traces for different initial target positions are drawn in Figure 31 
Average path length is approximately 1.7, which is close to the expected path 
length computed by SARSOP on a 15 x 15 grid, but is still suboptimal. Each 
replanning iteration took about 15 s on average, with standard deviation 10 s. 
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Fig. 7 Left: a partial localization execution using only 5 holdout particles. Because 
the evaluation function is noisy, the plan is often drastically revised and the walls 
have not yet been sensed after 30 steps. Right: by initiating replanning only when 
information gain exceeds a threshold, the path is smoother and two walls have been 
sensed within 30 steps. 




Fig. 8 In localization problems up to 5 dimensions the number of replanning steps 
scales roughly linearly. Columns report average, mininmm, and maximum steps 
over 10 trials. 



5.2 Localization Scenario 



In our second scenario a robot is in an unknown configuration in a known 
d-dimensional environment and must localize itself and reach a small goal by 
measuring the distance to obstacles. The sensor has a limited range, which 
requires that the robot perform several steps of active sensing before reaching 
the goal. The optimal strategy is to proceed toward a wall until the sensor 
returns a reading, and then proceed to an adjacent wall until a closer reading 
is obtained, and so on until it achieves d readings from d linearly independent 
walls. Note that RBSR does not have a "proceed until" action in its action 
set, so instead it must approximate such a policy by a sequence of conditional 
movement actions and sensing actions. 

In the first experiment (Figure [5]), we set d = 2, 5 is the unit square, the 
initial belief state is a circular Gaussian distribution with standard deviation 
0.1, and the goal radius and the sensing radius are both set to 0.05. To 
represent belief states we used 150 particles with a holdout set of 50. We 
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also tested a space containing obstacles (Figure |B|. In both examples, RBSR 
performs localization by moving close to obstacle boundaries, in somewhat 
random fashion, until it senses nearby walls. This continues until sufficient 
data is gathered to reach the goal. 

We tested the effects of reducing the size of the holdout set to, and what we 
found was that the resulting executions tend to be much more noisy due to 
spurious noise in /(&). In such cases, we found better results when replanning 
is initiated only when the current belief state experiences a large information 
gain due to an incoming observation (Figure [T]). We hope to explore this 
strategy further in future work. 

Our final set of experiments tested scalability with respect to dimension. 
Figure [5] plots the number of replanning steps taken by RBSR in problems 
from d == 3 to d = 5 in the unit hypercube. We increased the number of 
particles to 500, but kept all other parameters unchanged from the experiment 
in Figure [5l These experiments suggest that the number of replanning steps 
is roughly linear in dimension. Running time per timestep is roughly linear 
in dimension as well, ranging from approximately 6 s in the 3D case up to 
approximately 14 s in the 5D case. In higher dimensions, the accuracy of the 
particle filter dropped off sharply. In future work we hope to explore more 
sophisticated belief state representations, like Gaussian mixture models, that 
can maintain accuracy with a manageable number of particles. 



6 Discussion: Exploration Strategies 

The experiments above are preliminary but promising, and in future work we 
would like to study RBSR's theoretical performance in the face of approx- 
imate belief states and randomization in the exploration strategy. In this 
section we argue why we expect that RBSR will work well in a broader class 
of problems; particularly those in which 1) random walks in belief space have 
a significant probability of finding useful information, and 2) in the process 
of information-gathering, uncertainty is not significantly increased. 

Under these assumptions, RBSR is roughly a belief-space analogue to the 
Randomized Path Planner (RPP) j2|], which addresses path planning in a 
deterministic, fully-observable environment by alternating potential field de- 
scent with random walks to escape local minima. RBSR is, however, better 
than RPP for two reasons: 1) it perform many walks in simulation only and 
then picks the best one for execution, and 2) it performs many walks in par- 
allel using the Voronoi bias heuristic, which is more efficient at exploring 
belief space than a random walk. So, we should be able to show that RBSR 
performs at least as well as RPP, which is probabilistically complete. 

Another interpretation is that RBSR uses macro-actions to make planning 
more efficient. The idea of macro-actions have existed for some time in the 
discrete POMDP literature as a way to reduce the exploration breadth and 
depth in large robotics problems [161 . For example, Hsiao et. al. addressed 
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a robot grasping problem using specially constructed macro-actions that ei- 
ther provide information or seek the goal [9|]. They demonstrate that if un- 
certainty grows slowly during information-gathering, then forward planning 
can be limited to depth one. RBSR can also be interpreted as depth-one 
forward planning, using the QMDP policy as a goal-seeking macro-action 
and belief-space sampling to produce information-gathering macro-actions 
on the fly. Two other recent works have also tackled the problem of con- 
structing macro-actions automatically and with increasing granularity during 
forward planning [3, [fj] . These approaches are limited to macro-actions that 
reach various states as subgoals, and we suspect that RBSR constructs better 
information-gathering macro-actions using belief space criteria; on the other 
hand we also suspect that the approaches in [SJ, |l3| construct more optimal 
plans by searching to a greater depth. (Note that our current presentation 
of RBSR does not incorporate action costs; future implementations may in- 
corporate path cost during the selection of information-gathering paths.) It 
remains an open question whether these varied approaches will yield problem- 
independent principles for generating and exploiting macro-actions in both 
discrete and continuous POMDPs. 

7 Conclusion 

This paper presented preliminary work in a Randomized Belief-Space Re- 
planning (RBSR) technique for partially-observable problems in continuous 
state spaces. It constructs partial plans by sampling open-loop actions at 
random, and by evaluating the quality of future belief states by simulating a 
QMDP-like policy that performs well when the state is well-localized. By it- 
eratively incorporating sensor feedback from plan execution and replanning, 
RBSR avoids having to compute a policy over large belief spaces. Experi- 
ments show that it solves a target pursuit problem with a 4D state space and 
a localization problem in 2D-5D state spaces relatively efficiently. 

Future work should attempt to formally characterize convergence rates of 
RBSR and perform experimental comparisons against established techniques 
for discrete POMDPs. Future benchmark development for partially observ- 
able continuous problems would aid the empirical study of planner sensitivity 
to dimensionality and other belief space properties. We also intend to address 
improving path optimality and using sensing more efficiently in the RBSR 
framework, because randomization yields somewhat jerky plans. With ad- 
ditional refinements, RBSR-like approaches may lead to breakthroughs in 
planning under partial observability in realistic robotic systems. 
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GPU-Based Parallel Collision Detection for 
Real-Time Motion Planning 



Jia Pan and Dinesh Manocha 



Abstract. We present parallel algorithms to accelerate collision queries for sample- 
based motion planning. Our approach is designed for current many-core GPUs and 
exploits the data-parallelism and multi-threaded capabilities. In order to take advan- 
tage of high number of cores, we present a clustering scheme and collision-packet 
traversal to perform efficient collision queries on multiple configurations simul- 
taneously. Furthermore, we present a hierarchical traversal scheme that performs 
workload balancing for high parallel efficiency. We have implemented our algo- 
rithms on commodity NVIDIA GPUs using CUDA and can perform 500, 000 colli- 
sion queries/second on our benchmarks, which is lOX faster than prior GPU-based 
techniques. Moreover, we can compute collision-free paths for rigid and articulated 
models in less than 100 milliseconds for many benchmarks, almost 50-lOOX faster 
than current CPU-based planners. 

1 Introduction 

Motion planning is one of the fundamental problems in algorithmic robotics. The 
goal is to compute collision-free paths for robots in complex environments. Some of 
the widely used algorithms for high-DOF (degree-of-freedom) robots are based on 
randomized sampling. These include algorithms based on PRMs |fT2l| and RRTs lfT4l . 
These methods tend to capture the topology of the free configuration space of the 
robot by generating a high number of random configurations and connecting nearby 
collision-free configurations (i.e. milestones) using local planning methods. The re- 
sulting algorithms are probabilistically complete and have been successfully used to 
solve many challenging motion planning problems. 

In this paper, we address the problem of designing fast and almost real-time plan- 
ning algorithms for rigid and articulated models. The need for such algorithms arises 
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not only from virtual prototyping and character animation, but also task planning for 
physical robots. Current robots (such as Willow Garage's PR2) tend to use live sen- 
sor data to generate a reasonably accurate model of the objects in the physical world. 
Some tasks, such as robot navigation or grasping, need to compute a collision-free 
path for the manipulator in real-time to handle dynamic environments. Moreover, 
many high-level task planning algorithms perform motion planning and subtask ex- 
ecution in an interleaved manner, i.e. the planning result of one subtask is used to 
construct the formulation of the following subtask [27 1. A fast and almost real-time 
planning algorithm is important for these applications. 

It is well known that a significant fraction (e.g. 90% or more) of randomized 
sampling algorithms is spent in collision checking. This includes checking whether 
a given configuration is in free-space or not as well as connecting two free-space 
configurations using a local planning algorithm. While there is extensive literature 
on fast intersection detection algorithms, some of the recent planning algorithms are 
exploiting the computational power and massive parallelism of commodity GPUs 
(graphics processing units) for almost real-time computation Il23ll22ll . Current GPUs 
are high-throughput many-core processors, which offer high data-parallelism and 
can simultaneously execute a high number of threads. However, they have a different 
programming model and memory hierarchy as compared to CPUs. As a result, we 
need to design appropriate parallel collision and planning algorithms that can map 
well to GPUs. 

Main Results. We present a novel, parallel algorithm to perform collision queries 
for sample-based motion planning. Our approach exploits parallelism at two levels: 
it checks multiple configurations simultaneously whether they are in free space or 
not and performs parallel hierarchy traversal for each collision query. Similar tech- 
niques are also used for local planning queries. We present clustering techniques 
to appropriately allocate the collision queries to different cores. Furthermore, we 
introduce the notion of collision-packet traversal, which ensures that all the config- 
urations allocated to a specific core result in similar hierarchical traversal patterns. 
The resulting approach also exploits fine-grained parallelism corresponding to each 
bounding volume overlap test to balance the workload. 

The resulting algorithms have been implemented on commodity NVIDIA GPUs. 
In practice, we are able to process about 500,000 collision queries per second on 
a $400 NVIDIA GeForce 480 desktop GPU, which is almost lOX faster than prior 
GPU-based collision checking algorithms. We also apply our collision checking 
algorithm for GPU-based motion planners to high DOF rigid and articulated robots. 
The resulting planner can compute collision-free paths in less than 100 milliseconds 
for various benchmarks and appears to be 50-lOOX faster than CPU-based planners. 

The rest of the paper is organized as follows. We survey related work on real-time 
motion planning and parallel collision-checking algorithms in Section [21 Section [3] 
gives an overview of our approach and we present our new parallel algorithm for 
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parallel collision queries in Section ID We highlight the performance of our algo- 
rithm on different benchmarks in Section [5] 



2 Previous Work 

In this section, we give a brief overview of prior work in real-time motion planning 
and parallel algorithms for collision detection. 

2.1 Real-Time Motion Planning 

An excellent survey of various motion planning algorithms is given in IfTTl . Many 
parallel algorithms have also been proposed for motion planning by utilizing the 
properties of configuration spaces (20). The distributed representation |51 can be 
easily parallelized. In order to deal with very high dimensional or difficult planning 
problems, distributed sampling-based techniques have been proposed |25|. 

The computational power of many-core GPUs has been used for many geomet- 
ric and scientific computations ||2T]| . The rasterization capabilities of a GPU can 
be used for real-time motion planning of low DOF robots ifTOl l26l or improve the 
sample generation in narrow passages Il24i r7]|. Recently, GPU-based parallel motion 
planning algorithms have been proposed for rigid models Il23ll22ll . 

2.2 Parallel Collision Queries 

Some of the widely used algorithms for collision query are based on bound- 
ing volume hierarchies (BVH), such as A:-DOP trees, OBB trees, AABB trees, 
etc IfTSl . Recent developments include parallel hierarchical computations on multi- 
core CPUs ini im and GPUs ^. CPU-based approaches tend to rely on 
fine-grained communication between processors, which is not suited for current 
GPU-like architectures. On the other hand, GPU-based algorithms |[l6l use work 
queues to parallelize the computation on the multiple cores. All these approaches 
are primarily designed to parallelize a single collision query for sample-based mo- 
tion planning. 

The capability to perform a high number of collision queries efficiently is es- 
sential in motion planning algorithms, e.g. for parallel collision queries in mile- 
stone computation and local planning. Some of the prior algorithms perform paral- 
lel queries in a simple manner: each thread handles a single collision query in an 
independent manner 1231 [22l [3l [2ll . As current multi-core CPUs have the capability 
to perform multiple-instruction multiple-data (MIMD) computations, these simple 
strategies can work well on CPUs. On the other hand, current GPUs offer high 
data parallelism and the ability to execute a high number of threads in parallel to 
overcome the high memory latency. As a result, we need different parallel collision 
detection algorithms to fully exploit their capabilities. 
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3 Overview 

In this section, we first provide some background on current GPU architectures. 
Next, we address some issues in designing efficient parallel algorithms to perform 
collision queries. 

3. 1 GPU Architectures 

In recent years, the focus in processor architectures has shifted from increasing 
clock rate to increasing parallelism. Commodity GPUs such as NVIDIA Fermy 
have theoretical peak performance of Tera-FLOP/s for single precision computa- 
tion and hundreds of Giga-FLOP/s for double precision computations. This peak 
performance is significantly higher as compared to current multi-core CPUs, thus 
outpacing CPU architectures |fT9l at relatively modest cost of $300 to $400. How- 
ever, GPUs have different architectural characteristics and memory hierarchy, that 
impose some constraints in terms of designing appropriate algorithms. First, GPUs 
usually have a high number of independent cores (e.g. the newest generation GTX 
480 has 15 cores and each core has 32 streaming processors resulting in total of 
480 processors while GTX 280 has only 240 processors). Each of the individual 
cores is a vector processor capable of performing the same operation on several 
elements simultaneously (e.g. 32 elements for current GPUs). Secondly, the mem- 
ory hierarchy on GPUs is quite different from that of CPUs and cache sizes on the 
GPUs are considerably smaller. Moreover, each GPU core can handle several sep- 
arate tasks in parallel and switch between them in the hardware when one of them 
is waiting for a memory operation to complete. This hardware multithreading ap- 
proach is thus designed to hide the memory access latency. Thirdly, all GPU threads 
are logically grouped in blocks with a per-block shared memory, which provides 
a weak synchronization capability between the GPU cores. Overall, shared mem- 
ory is a limited resource on GPUs: increasing the shared memory distributed for 
each thread can limit the extent of parallelism. Finally, the threads are physically 
processed in chunks in SIMT (single-instruction, multiple-thread). This is different 
from SIMD (single-instruction multiple-data) and each thread can execute indepen- 
dent instructions. The GPU's performance can reduce significantly when threads in 
the same chunk diverge considerably, because these diverging portions are executed 
in a serial manner for all the branches. As a result, threads with coherent branch- 
ing decisions (e.g. threads traversing the same paths in the BVH) are preferred on 
GPUs in order to obtain higher performance IS). All of these characteristics imply 
that - unlike CPUs - achieving high performance in current GPUs depends on sev- 
eral factors: (1) generating a sufficient number of parallel tasks so that all the cores 
are highly utilized; (2) developing parallel algorithms such that the total number of 
threads is even higher than the number of tasks, so that each core has enough work to 
perform while waiting for data from relatively slow memory accesses; (3) assigning 
appropriate size for shared memory to accelerate memory accesses and not reduce 
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the level of parallelism; (4) performing coherent or similar branching decisions for 
each parallel thread within a given chunk. These requirements impose constraints in 
terms of designing appropriate collision query algorithms. 



3.2 Notation and Terminology 

We define some terms and highlight the symbols used in the rest of the paper. 

chunk. The minimum number of threads that GPUs manage, schedule and execute 
in parallel, which is also called warp in the GPU computing literatures. The size 
of chunk (chunk-size or warp-size) is 32 on current NVIDIA GPUs (e.g. GTX 
280 and 480). 

block. The collection of GPU threads that will be executed on the same GPU 
core. These threads synchronize by using barriers and communicate via a small 
high-speed low-latency shared memory. 

BVHa. The bounding volume hierarchy (BVH) tree for model a. It is a binary tree 
with L levels, whose nodes are ordered in the breadth-first order starting from the 
root node. Each node is denoted as B VHa [;'] and its children nodes are B VH„ [2;] 
and B VH„ [2/ + I] with 1 < / < 2^" ' - 1 . The nodes at the Z-th level of a BVH tree 
are represented as BVHa[A:],2' < k < 2'+^ — 1 with < I < L. The inner nodes 
are also called bounding volumes (B V) and the leaf nodes also have a link to the 
primitive triangles that are used to represent the model. 

BVTT„j;. The bounding volume test tree (BVTT) represents recursive collision 
query traversal between two objects a,b. It is a 4-ary tree, whose nodes are or- 
dered in the breadth-first order starting from the root node. Each node is de- 
noted as BVTT„ ,(;[;] = (BVH„[m], BVHi[M]) or simply {m,n), which checks 
the BV or primitive overlap between nodes BVHa[m] and BVH/,[«], while m = 
[i - 4^ J + 2^, n = {/ - ^} + 2^ and M = llog4{3i - 2) J . BVTT node 
{m,n)'s children are (2m, 2n), (2m,2«+ I), (2m+ I, 2m), (2m+ I,2m+ 1). 

q. A configuration of the robot, which is randomly sampled within the configura- 
tion space "^-Space. q is associated with the transformation Tq. The BVH of a 
model a after applying such a transformation is given as BVH„(q). 



3.3 Collision Queries: Hierarchical Traversal 

Collision queries between the geometric models are usually accelerated with hierar- 
chical techniques based on BVHs, which correspond to traversing the BVTT related 
with the BVHs of the models ifTSl . The simplest parallel algorithms to perform mul- 
tiple collision queries are based on each thread traversing the BVTT and checking 
whether a given configuration is in free space or not. Such a simple parallel algo- 
rithm is highlighted in Algorithm[T] This strategy is easy to implement and has been 
used in previous parallel planning algorithms based on multi-core or multiple CPUs. 
But it may not result in high parallel efficiency on current GPUs due to the follow- 
ing reasons. First, each thread needs a local traverse stack on the shared memory 
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which may not be effective for complex models with thousands of polygons. Sec- 
ond, different threads may traverse the BVTT tree with incoherent patterns: there 
are many branching decisions performed during the traversal (e.g. loop, if, return 
in the pseudo-code) and the traversal flow of the hierarchy in different threads di- 
verges quickly. Finally, different threads can have varying workloads; some may be 
busy with the traversal while the others may have finished the traversal early due 
to no overlap and are idle. These factors can affect the performance of the parallel 
algorithm. 

The problems of low parallel efficiency in Algorithm [T| become more severe in 
complex or articulated models. For such models, there are longer traversal paths 
in the hierarchy and the difference between these paths can be large for different 
configurations. As a result, differences in the workloads between different threads 
can be high. For articulated models, each thread checks the collision status of all the 
links and stops when a collision is detected for any link. Therefore, more branching 
decisions are performed within each thread and this can lead to more incoherence. 
Similar issues also arise during local planning when each thread determines whether 
two milestones can be joined by a collision-free path by checking the collisions 
along the trajectory connecting them. 



Input: N random configurations {q,}^] , BVH„ for the robot and BVH^ for the obstacles 
Output: return whether one configuration is in free space or not 
tilt ■*— thread id of current thread 

< traverse stack 5[] is initialized with root nodes 

shared 5[] = local traversal stack 

S[] ^BVTT[1] = (BVH„(q)[l],BVH6[l]) 

■d traverse BVTT for BVH„(q) and BVH^ 

loop 

(x,y)^pop{S). 

if overlap(BVHa (q) [x] ,BVHi [y] ) then 

S[] ^ {2x,2y),{2x,2y+l),(2x+ l,2,y), {2x+ 1,2>'+ 1) if \isLeaf{x) &&UsLeaf{y) 
S[] ^ {2x,2y),{2x,2y+l) if isLeaf{x) && \isLeaf{y) 
5[] ^ {2x,2y),l2x+ \,2y) if UsLeaf{x) && isLeafiy) 

return collision if isLeaf{x) && isLeaf[y) && exactIntersect{B\Ha{(\)[x\,B\Hb'\y\) 
end if 
end loop 
return collision-free 



4 Parallel Collision Detection on GPUs 

In this section, we present two novel algorithms for efficient parallel collision check- 
ing on GPUs between rigid or articulated models. Our methods can be used to check 
whether a configuration lies in the free space or to perform local planning com- 
putations. The first algorithm uses clustering and fine-grained packet-traversal to 
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improve the coherence of BVTT traversal for different threads. The second algo- 
rithm uses queue-based techniques and lightweight workload balancing to achieve 
higher parallel performance on the GPUs. In practice, the first method can provide 
30%-50% speed up. Moreover, it preserves the per-thread per-query structure of the 
naive parallel strategy. Therefore, it is easy to implement and is suitable for cases 
where we need to perform some additional computations (e.g. retraction for han- 
dling narrow passages ||29]| ). The second method can provide 5-lOX speed up, but 
is relatively more complex to implement. 

4.1 Parallel Collision-Packet Traversal 

Our goal is to ensure that all the threads in a block performing B VTT-based collision 
checking have similar workloads and coherent branching patterns. This approach is 
motivated by recent developments related to interactive ray-tracing on GPUs for vi- 
sual rendering. Each collision query traverses the BVTT and performs node-node or 
primitive-primitive intersection tests. In contrast, ray-tracing algorithms traverse the 
BVH tree and perform ray-node or ray-primitive intersections. Therefore, parallel 
ray-tracing algorithms on GPUs also need to avoid incoherent branches and varying 
workloads to achieve higher performance. 

In real-time ray tracing, one approach to handle the varying workloads and inco- 
herent branches is the use of ray -packets ISKIl. In ray-tracing terminology, packet 
traversal implies that a group of rays follows exactly the same traversal path in 
the hierarchy. This is achieved by sharing the traversal stack (similar to the BVTT 
traversal stack in Algorithm[Tll among the rays in the same warp-sized packet (i.e. 
threads that fit in one chunk on the GPU), instead of each thread using an inde- 
pendent stack for a single ray. This implies that the same additional nodes in the 
hierarchy may be visited during ray intersection tests, even though there are no in- 
tersections between the rays and those nodes. But the resulting traversal is coherent 
for different rays, because each node is fetched only once per packet. In order to 
reduce the number of computations (i.e. unnecessary node intersection tests), all the 
rays in one packet should be similar to one another, i.e. have similar traversal paths 
with few differing branches. We extend this idea to parallel collision checking and 
refer to our algorithm as multiple configuration-packet method. 

The first challenge is to cluster similar collision queries or the configurations into 
groups. In some cases, the sampling scheme (e.g. the adaptive sampling for lazy 
PRM) can provide natural group partitions. However, in most cases we need suitable 
algorithms to compute these clusters. Clustering algorithms are natural choices for 
such a task, which aims at partitioning a set ,9^ ofN data items {x,}^j into K groups 
{Ck}f^i such that the data items belonging to the same group are more "similar" 
than the data items in different groups. The clustering algorithm used to group the 
configurations needs to satisfy some additional constraints: |Q.| = chunk-size, 1 < 
k < K, i.e. each cluster should fit in one chunk on GPUs, except for the last cluster 
and K = [ c-/,;,,,'! ;,ve 1 ■ Using the formulation of ^-means, the clustering problem can 
be formally described as: compute K = \ ^^^ j^ ^ .^^ ] items {ctjf^j that minimizes 
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EElx,eQl|x,--C,l|, (1) 

/=U=1 

with constraints |Q| = chunk-size, 1 < ^ < ^. To our knowledge, there are no clus- 
tering algorithms designed for this specific problem. One possible solution is clus- 
tering with balancing constraints |4|, whichhas additional constraints |Q| >m,l< 
k < K, where m < ^. 

Instead of solving Equation ([T]) exactly, we use a simpler clustering scheme to 
compute an approximate solution. First, we use ^-means algorithm to cluster the A'^ 
queries into C clusters, which can be implemented efficiently on GPUs Q. Next, 
for ^-th cluster of size Sk, we divide it into \-, — j^^l sub-clusters, each of which 

'^' I chunk-size I ' 

corresponds to a configuration-packet. This simple method has some disadvantages. 

For example, the number of clusters is xLi \ chimLize ^ ^ -^ = \ chunk-size ^ ^"'^ *^''^" 
fore Equation ([T]l may not result in an optimal solution. However, as shown later, 
even this simple method can improve the performance of parallel collision queries. 

Next we map each configuration-packet to a single chunk. Threads within one 
packet will traverse the BVTT synchronously, i.e. the algorithm works on one B VTT 
node {x,y) at a time and processes the whole packet against the node. If {x,y) is a 
leaf node, an exact intersection test is performed for each thread. Otherwise, the 
algorithm loads its children nodes and tests the BVs for overlap to determine the re- 
maining traversal order, i.e. to select one child {x,„,ym) as the next BVTT node to be 
traversed for the entire packet. We select {xm,ym) in a greedy manner: it corresponds 
to the child node that is classified as overlapping by the most threads in the packet. 
We also push other children into the packet's traversal stack. In case no BV overlap 
is detected in all the threads or {x,y) is a leaf node, (.»:,„, >'«i) would be the top ele- 
ment in the packet's traversal stack. The traversal step is repeated recursively, until 
the stack is empty. Compared to Algorithm[Tl all the threads in one chunk share one 
traversal stack in shared memory, instead of using one stack for each thread. There- 
fore, the size of shared memory used is reduced by chunk-size times and results in 
higher parallel efficiency. 

The traversal order described above is a greedy heuristic that tries to minimize 
the traversal path of the entire packet. For one BVTT node (x, y) , if the overlap is not 
detected in any of the threads, it implies that these threads will not traverse the sub- 
tree rooted at (.t, v). Since all the threads in the packet are similar and traverse the 
BVTT in nearly identical order, this implies that other threads in the same packet 
might not traverse the sub-tree either. We define the probability that the sub-tree 
rooted at {x,y) will be traversed by one thread as p^.v = *'''Tc!'Zt'-l'i-e'''' ■ ^°^ ^"y 
traverse pattern P for BVTT, the probability that it is carried on by BVTT traversal 
will be pp = Yl(x,y)eP Px.y As a result, our new traversal strategy guarantees that the 
traversal pattern with higher traverse probability will have a shorter traversal length, 
and therefore minimizes the overall path for the packet. 

The decision about which child node is the candidate for next traversal step is 
computed using sum reduction f9\, which can compute the sum of n items in par- 
allel with 0(log(M)) complexity. Each thread writes a 1 in its own location in the 
shared memory if it detects overlap in one child and otherwise. The sum of the 
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memory locations is computed in 5 steps for a size 32 chunk. The packet chooses 
the child node with the maximum sum. The complete algorithm for configuration- 
packet computation is described in Algorithm[2l 



9 
10 
11 
12: 
13 
14 
15 
16 
17: 
18: 
19 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30 
31 



Input: N random configurations {q,}Jli, BVH„ for the robot and BVH^ for the obstacles 
f,y ^- thread id of current thread 

q ^ q'w 

shared CN[]= shared memory for children node 
shared r5[]= local traversal stack 
shared SM[]= memory for sum reduction 

return if ove?-/ap(BVHn(q)[l], BVHj,[l]) is false for all threads in chunk 

loop 

if isLeaf(x) && isLeaf{y) then 

update collision status of q if exactlntersect (B VHq (q) [x] ,B VH^ [y] ) 
break, if r^ is empty 

{^,y)^ pop{TS) 

else 

<1 decide the next node to be traversed 
CNW ^- (^,>')'s children nodes 
forall(;i:c,>'c) e CA' do 

< compute the number of threads that detect overlap at node {xc,yc) 
write over/ap(BVHfl(q)[.\>],BVHi[yc]) (0 or 1) into 5M[f;j] accordingly 
compute local summation Sc in parallel by all threads in chunk 

end for 

if maXf ie > then 

<] select the node that is overlapped in the most threads 

(x,y) ^ CA'fargmaXp.Sf] and push others into TS 
else 

< select the node from the top of stack 
break, if TS is empty 

(x,y) ^ pop{TS) 
end if 
end if 
end loop 



4.2 Parallel Collision Query with Workload Balancing 

Both Algorithm [1] and Algorithm |2] use the per-thread per-query strategy, which is 
easy to implement. However, when the idle threads wait for busy threads or when 
the execution path of threads diverges, the parallel efficiency on the GPUs is low. 
Algorithmic can reduce this problem in some cases, but it still distributes the tasks 
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among the separate GPU cores and cannot make full use of the GPU's computational 
power. 

In this section, we present the parallel collision query algorithm based on work- 
load balancing which further improves the performance. In this algorithm, the task 
of each thread is no longer one complete collision query or continuous collision 
query (for local planning). Instead, each thread only performs BV overlap tests. In 
other words, the unit task for each thread is distributed in a more fine-grained man- 
ner. Basically, we formulate the problem of performing multiple collision queries 
as a pool of BV overlap tests which can be performed in parallel. It is easier to 
distribute these fine-grained tasks in a uniform manner onto all the GPU cores, and 
thereby balancing the load among them, than to distribute the collision query tasks. 

All the tasks are stored in Q large work queues in the GPU's main memory, which 
has a higher latency compared to the shared memory. When computing a single col- 
lision query 11161 , the tasks are in the form of BVTT nodes [x^y). Each thread will 
fetch some tasks from one work queue into its local work queue on the shared mem- 
ory and traverse the corresponding BVTT nodes. The children generated for each 
node are also pushed into the local queue as new tasks. This process is repeated 
for all the tasks remaining in the queue, until the number of threads with full or 
empty local work queues exceeds a given threshold (we use 50% in our implemen- 
tation) and non-empty local queues are copied back to the work queues on main 
memory. Since each thread performs simple tasks with few branches, our algorithm 
can make full use of GPU cores if there are sufficient tasks in all the work queues. 
However, during BVTT traversal, the tasks are generated dynamically and thus dif- 
ferent queues may have varying numbers of tasks and this can lead to an uneven 
workload among the GPU cores. We use a balancing algorithm that redistributes the 
tasks among work queues (Figure |2ll. Suppose the number of tasks in each work 
queue is «,, 1 < / < Q. Whenever 3/, n, < T[ or n, > r„, we execute our balancing 
algorithm among all the queues and the number of tasks in each queue becomes 

nl = — M — , 1 < ' < 2, where T[ and 7], are two thresholds (we use chunk-size for 
7] and the W — chunk-size for 7],, where W is the maximum size of work queue). 

In order to handle A' collision queries simultaneously, we use several strategies, 
which are similar to the ones highlighted in Figure [T] First, we can repeat the sin- 
gle query above algorithm Iil6il for each query. However, this has two main disad- 
vantages. First, the GPU kernel has to be called A^ times from the CPU, which is 
expensive for large N (which can be ^ 10000 for motion planning applications). 
Secondly, for each query, work queues are initialized with only one item (i.e. the 
root node of the BVTT), therefore the GPU's computational power cannot be fully 
exploited at the beginning of each query, as shown in the slow ascending part in 
Figure [ita). Similarly, at the end of each query, most tasks are finished and some 
of the GPU cores become idle, which corresponds to the slow descending part in 
Figure [Ha). 

As a result, we use the strategy shown in Figure [ijb): we divide the A^ queries 
into \jjP\ different sets each of size M with M <.N and initialize the work queues 
with M different BVTT roots for each iteration. Usually M cannot be N because 
we need to use t ■ M GPU global memory to store the transform information for the 
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Fig. 1 Different strategies for parallel collision query using work queues, (a) Naive way: 
repeat the single collision query algorithm in 1161 one by one; (b) Work queues are initialized 
by some BVTT root nodes and we repeat the process until all queries are performed, (c) is 
similar to (b) except that new BVTT root nodes are added to the work queues by the pump 
kernel, when there are not a sufficient number of tasks in the queue. 

queries, where constant t < #g^"^"^^"g"'°o' ^^jj^j ^g usually use M = 50. In this case, 
we only need to invoke the solution kernel \^~\ times. The number of tasks available 
in the work queues changes more smoothly over time, with fewer ascending and 
descending parts, which implies higher throughput of the GPUs. Moreover, the work 
queues are initialized with many more tasks, which results in high performance at 
the beginning of each iteration. In practice, as nodes from more than one BVTT 
of different queries co-exist in the same queue, we need to distinguish them by 
representing each BVTT node by {x,y,i) instead of {x,y), where / is the index of 
collision query. 

We can further improve the efficiency by using the pump operation (Algo- 
rithm [3] and Fig nil. That is, instead of initializing the work queues after it is com- 
pletely empty, we add M BVTT root nodes of unresolved collision queries into 
the work queues when the number of tasks in it decreases to a threshold (we use 
10 • chunk-size). As a result, the few ascending and descending parts in FigurelHb) 
can be further flattened as shown in Figure [TJc). Pump operation can reduce the 
timing overload of interrupting traversal kernels or copying data between global 
memory and shared memory, and therefore improve the overall efficiency of colli- 
sion computation. 



4.3 Analysis 

In this section, we analyze the algorithms described above using the parallel random 
access machine (PRAM) model, which is a popular tool to analyze the complexity 
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shared WQ[] = local work queue 
initialize WQhy tasks in global work queues 
<1 traverse on work queues instead of BVTTs 
loop 

(x,y. ^ pop{WQ) 
if overlapiBYHa (q,) H ,B VHfo[y] ) then 
if isLeaf{x) && isLeaf(y) then 

update collision status of i-th query if exart/nfe«ecf(BVHa(q,)[.i:],BVH;,[y]) 
else 

WQW ^- (x,y,iyf, children 
end if 
end if 
if WQ is full or empty then 

atomicInc{signal), break 
end if 
end loop 
return it signal > 50%Q 

balance 4)rocess{) 

copy local queue back to global work queue <\ manage_kernel 

compute size of each work queue ni,! <i <Q 
if 3(,;i,- < Ti\\ni > T„ then 

rearrange the tasks so that each queue has nj = '"A '' tasks < balance_kernel 

add more tasks in global queue if S^i "* < Tpu,„p < pump_kernel 

end if 



of parallel algorithms ifTTl . Of course, current GPU architectures have many prop- 
erties that can not be described by PRAM model, such as SIMT, shared memory, 
etc. However, PRAM analysis can still provide some insight into GPU algorithm's 
performance. 

Suppose we have n collision queries, which means that we need to traverse n 
B VTT of the same tree structure but with different geometry configurations. We also 
suppose the GPU has p parallel processors. For convenience, assume n ~ ap,a S Z. 
Let the complexity to traverse the i-th BVTT he W{i), I < i < n. Then the com- 
plexity of a sequential CPU algorithm is Ts{n) ~ Z"=i^(0 and the complexity 



of Algorithm [U would be 7]v(«) = I^Z^max^^j ^(jt/j + y). If we sort {W{i)}'!^ 
in ascending order and denote W*{i) as the (-th element in the new order, we can 
prove TlZl^rnax'j^^Wikp + j) > TLiW*ikp). Therefore 7]v(«) > TLiW*ikp). 

Moreover, it is obvious that X;'=i W{i) > Tf^{n) > '=' , which means 7Af(n) = 
Q{Ts{n)), i.e. 7V(n) is work-efficient ifTTIl . 
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Fig. 2 Load balancing strategy for our parallel collision query algorithm. Each thread keeps 
its own local work queue in local memory. After processing a task, each thread is either 
able to run further or has an empty or full work queue and terminates. Once the number of 
GPU cores terminated exceeds a given threshold, the manage kernel is called and copies the 
local queues back onto global work queues. If no work queue has too many or too few tasks, 
the task kernel restarts. Otherwise, the balance kernel is called to balance the tasks among 
all queues. If there are not sufficient tasks in the queues, more BVTT root nodes will be 
'pumped' in by the pump kernel. 



According to the analysis in Section I4TT1 we know that the expected complexity 
W{i) for (-th BVTT traversal in Algorithm [2] should be smaller than W{i) because 
of the near-optimal traversing order. Moreover, the clustering strategy is similar 
to ordering different BVTTs, so that the BVTTs with similar traversal paths are 
arranged closely to each other and thus the probability is higher that they would 
be distributed on the same GPU core. Of course we can not implement ordering 
exactly because the BVTT traversal complexity is not known a priori. Therefore the 
complexity of Algorithm[2]is 7>(n) « I^^j W*{kp), with W* < W*. 

The complexity for Algorithm [3] is simple: 7b (n) = '°' +B{n), where the 
first item is the timing complexity for BVTT traversal and the second item B{n) 
is the timing complexity for balancing step. As B{n) > 0, the acceleration ratio of 
GPU with p-processors is less than p. We need to reduce the overload of balancing 
step to improve the efficiency of Algorithm[3] 

Therefore, all three algorithms are work-efficient. If B{n) = o{Ts{n)), then 
TN{n) > Tp{n) > TB(n) and Algorithm [3] is the most efficient one. If B{n) is 
Q{Ts{n)), which means the overhead of balancing kernel is large, then it is possible 
to have TB{n) > Tp{n). Moreover, for large models, W{i) would be quite different 
and the performance difference between three algorithms would be larger. 
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5 Implementation and Results 



In this section, we present some details of the implementation and highlight the 
performance of our algorithm on different benchmarks. All the timings reported 
here were recorded on a machine using an Intel Core i7 3.2GHz CPU and 6GB 
memory. We implemented our collision and planning algorithms using CUDA on a 
NVIDIA GTX 480 GPU with 1GB of video memory. 

We use the motion planning framework called gPlanner introduced in Il23ll22ll . 
which uses PRM as the underlying planning algorithm as it is more suitable to ex- 
ploit the multiple cores and data parallelism on GPUs. It can either compute a com- 
plete roadmap or we use a lazy version to perform a single motion planning query. 
We replace the collision module in gPlanner with the new algorithms described 
above. As observed in f23^, more than 90% time of the planning algorithm is spent 
in collision queries, i.e. milestone computation step and local planning step. 

In order to compare the performance of different parallel collision detection algo- 
rithms, we use the benchmarks highlighted in Figure [3] Their geometric complex- 
ities are highlighted in Table [T] For rigid body benchmarks, we generate 50,000 
random configurations and compute a collision-free path by using different variants 
of our parallel collision detection algorithm. For articulated model benchmark, we 
generate 100,000 random configurations. For milestone computation, we directly 
use the collision detection algorithms. For local planning, we first need to unfold all 
the interpolated configurations: we denote the BVTT for the y-th interpolated query 
between the /-th local path as BVTT((,y) and its node as {x,y, i,j). In order to avoid 
unnecessary computations, we first add BVTT root nodes with small j into the work 
queues, i.e. (l,l,;,y) -< (1, 1, /',/), ify </. As a result, once a collision is found at 
BVTT(«,7o), we need not to traverse BVTT(/,7) when j > jq. 



Table 1 Geometric complexity of our benchmarks. Large-piano is a piano with more vertices 
and faces by subdividing the piano model. 
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helicopter 


humanoid 


#robot-faces 
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(a) piano (b) helicopter 

Fig. 3 Benchmarks for parallel collision queries. 
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Table 2 Comparison of different algorithms in milestone computation (timing in millisec- 
onds). 32 and 128 are the different sizes used for the traversal stack; C and no-C means using 
pre-clustering and not using pre-clustering, respectively; timing of Algorithm[3]includes two 
parts; traversal part and balancing part. 





Algorithm [T] 


Algorithm|2] 


Algorithm|3] | 




32, no-C 


32, C 


128, no-C 


128, C 


32, no-C 


32, C 


128, no-C 


128, C 


traverse 


balancing 


piano 


117 


113 


239 


224 


177 


131 


168 


130 


68 


3.69 


large-piano 


409 


387 


738 


710 


613 


535 


617 


529 


155 


15.1 


helicopter 


158 


151 


286 


272 


224 


166 


226 


163 


56 


2.3 


humanoid 


2392 


2322 


2379 


2316 


2068 


1877 


2073 


1823 


337 


106 



Table 3 Comparison of different algorithms in local planning (timing in milliseconds). 32 and 
128 are the different sizes used for the traversal stack; C and no-C means using pre-clustering 
and not using pre-clustering, respectively; timing of Algorithm|3]includes two parts; traversal 
part and balancing part. 





Algorithm [T] 


Algorithm |2] 


Algorithmic] | 




32, no-C 


32, C 


128, no-C 


128, C 


32, no-C 


32, C 


128, no-C 


128, C 


traverse 


balancing 


piano 


1203 


1148 


2213 


2076 


1018 


822 


1520 


1344 


1054 


34 


large-piano 


4126 


3823 


8288 


7587 


5162 


4017 


7513 


6091 


1139 


66 


helicopter 


4528 


4388 


7646 


7413 


3941 


3339 


5219 


4645 


913 


41 


humanoid 


5726 


5319 


9273 


8650 


4839 


4788 


9012 


8837 


6082 


1964 



For Algorithm [T] and Algorithm [21 we further test the performance for differ- 
ent traversal sizes (32 and 128). Both algorithms give correct results when using a 
larger stack size (128). For smaller stack sizes, the algorithms will stop once the 
stack is filled. Algorithm [T] may report a collision when the stack overflows while 
Algorithm|2]returns a collision-free query. Therefore, Algorithm[T|may suffer from 
false positive errors while Algorithm |2] may suffer from false negative errors. We 
also compare the performance of Algorithm[T]and Algorithm|2]when the clustering 
algorithm described in Section |4J1 is used and when it is not. 

The timing results are shown in Tableland Table [3] We can observe: (1) Al- 
gorithm [T] and Algorithm |2] both work better when local traverse stack is smaller 
and when pre-clustering is used. However for large models, traversal stack of size 
32 may overflow and the collision results will be incorrect, which happens for the 
large-piano benchmarks in Table [2] and Table [3] Algorithm [T]s performance will be 
terribly reduced when traverse stack size increases to 128 while Algorithmic does 
not change much. The reason is that Algorithm [2| uses per-packet stack, which is 
about 32 times less than using per-thread stack. Clustering and packet can result in 
a more than 50% speed-up. Moreover, the improvement of Algorithm[2|over Algo- 
rithm[T]is increased on larger models (large-piano) than on smaller models (piano). 
(2) Algorithm[3]is usually the fastest one among all the variations of the three algo- 
rithms. It can result in more than 5-lOx increase in acceleration. 
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Fig. 4 GPU throughput improvement caused by pump kernel. Left figure shows the through- 
put without using the pump and right figure shows the throughput using the pump. 



As observed in 112311221 . all these benchmarks are dominated by milestone com- 
putation and local planning steps as part of the overall parallel motion planning 
framework. The two parts take more than 50% running time in both the basic PRM 
and lazy PRM. Therefore, the overall planning algorithm can be improved by at 
least 40%-45%. 

In Figure |4l we also show how the pump kernel increases the GPU throughput 
(i.e. the number of tasks available in work queues for GPU cores to fetch) in work- 
load balancing based algorithm Algorithm [3l The maximum throughput (i.e. the 
maximum number of BV overlap tests performed by GPU kernels) increases from 
8 X 10^ to nearly 10^ and the minimum throughput increases from to 2.5 x 10^. 
For piano and helicopter, we can compute a collision-free path from the initial to 
the goal configuration in in 879ms and 778ms separately using PRM or 72.79ms or 
72.68ms using lazy PRM. 



6 Conclusion and Future Work 

In this paper, we introduce two novel parallel collision query algorithms for real- 
time motion planning on GPUs. The first algorithm is based on configuration-packet 
tracing, is easy to implement and can improve the parallel performance by perform- 
ing more coherent traversals and reduce the memory consumed by traversal stacks. 
It can provide more than 50% speed-up as compared to simple parallel methods. The 
second algorithm is based on workload balancing, and decomposes parallel collision 
queries into fine-grained tasks of B VTT node operations. The algorithm uses a light- 
weight task-balancing strategy to guarantee that all GPU cores are fully loaded and 
achieves close to the peak performance on GPUs. It can provide 5-lOX speed-up 
compared to simple parallel strategy. The overall performance of the GPU-based 
randomized planner also increases more than 50% when compared to the previous 
GPU planner. 

There are many avenues for future work. We are interested in using more ad- 
vanced sampling schemes with the planner to further improve its performance and 
allow us to work on motion planning problems with narrow passages. Furthermore, 
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we would like to adjust the planner to generate smooth paths and integrate our plan- 
ner with certain robots (e.g. PR2). 

Acknowledgements. This work was supported in part by ARO Contract W911NF-04-1- 
0088, NSF awards 0636208, 0917040 and 0904990, DARPA/RDECOM Contract WR91CRB- 
08-C-0137, and Intel. 



References 

1. Aila, T, Laine, S.: Understanding the efficiency of ray traversal on GPUs. In: High Per- 
formance Graphics, pp. 145-149 (2009) 

2. Akinc, M., Bekris, K.E., Chen, B.Y., Ladd, A.M., Plaku, E., Kavraki, L.E.: Probabilistic 
roadmaps of trees for parallel computation of multiple query roadmaps. In: Robotics Re- 
search. Springer Tracts in Advanced Robotics, vol. 15, pp. 80-89. Springer, Heidelberg 
(2005) 

3. Amato, N., Dale, L.: Probabilistic roadmap methods are embarrassingly parallel. In: In- 
ternational Conference on Robotics and Automation, pp. 688-694 (1999) 

4. Banerjee, A., Ghosh, J.: Scalable clustering algorithms with balancing constraints. Data 
Mining and Knowledge Discovery 13(3), 365-395 (2006) 

5. Barraquand, J., Latombe, J.C.: Robot motion planning: A distributed representation ap- 
proach. International Journal of Robotics Research 10(6) (1991) 

6. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance 
study of general-purpose applications on graphics processors using cuda. Journal of Par- 
allel and Distributed Computing 68(10), 1370-1380 (2008) 

7. Foskey, M., Garber, M., Lin, M., Manocha, D.: A voronoi-based hybrid planner. In: 
International Conference on Intelligent Robots and Systems, pp. 55-60 (2001) 

8. Gunther, J., Popov, S., Seidel, H.P., Slusallek, P.: Realtime ray tracing on GPU with 
BVH-based packet traversal. In: IEEE Symposium on Interactive Ray Tracing, pp. 113- 
118(2007) 

9. Harris, M.: Optimizing parallel reduction in CUDA. NVIDIA Developer Technology 
(2009) 

10. Hoff, K., Culver, T, Keyser, J., Lin, M., Manocha, D.: Interactive motion planning us- 
ing hardware accelerated computation of generalized voronoi diagrams. In: International 
Conference on Robotics and Automation, pp. 2931-2937 (2000) 

11. JaJa, J.: An introduction to parallel algorithms. Addison Wesley Longman Publishing 
Co., Inc., Amsterdam (1992) 

12. Kavraki, L., Svestka, P., Latombe, J.C., Overmars, M.: Probabilistic roadmaps for path 
planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and 
Automation 12(4), 566-580 (1996) 

13. Kim, D., Heo, J.P, Huh, J., Kim, J., Yoon, S.E.: HPCCD: Hybrid parallel continuous 
collision detection using cpus and gpus. Computer Graphics Forum 28(7), 1791-1800 
(2009) 

14. Kuffner, J., LaValle, S.: RRT-connect: An efficient approach to single-query path plan- 
ning. In: International Conference on Robotics and Automation, pp. 995-1001 (2000) 

15. Larsen, E., Gottschalk, S., Lin, M., Manocha, D.: Distance queries with rectangular 
swept sphere volumes. In: International Conference on Robotics and Automation, pp. 
3719-3726 (2000) 



228 J. Pan and D. Manocha 

16. Lauterbach, C, Mo, Q., Manocha, D.: gproximity: Hierarchical gpu-based operations for 
collision and distance queries. Computer Graphics Forum 29(2), 419^28 (2010) 

17. LaValle, S.M.: Planning Algorithms. Cambridge University Press, Cambridge (2006) 

18. Lin, M., Manocha, D.: Collision and proximity queries. In: Handbook of Discrete and 
Computational Geometry, pp. 787-808. CRC Press, Inc., Boca Raton (2004) 

19. Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: A unified graphics 
and computing architecture. IEEE Micro 28(2), 39-55 (2008) 

20. Lozano-Perez, T., O'Donnell, P.: Parallel robot motion planning. In: International Con- 
ference on Robotics and Automation, pp. 1000-1007 (1991) 

21. Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Kriiger, J., Lefohn, A.E., Purcell, 
T: A survey of general-purpose computation on graphics hardware. Computer Graphics 
Forum 26(1), 80-113 (2007) 

22. Pan, J., Lauterbach, C, Manocha, D.: Efficient nearest-neighbor computation for GPU- 
based motion planning. In: International Conference on Intelligent Robots and Systems 
(to appear, 2010) 

23. Pan, J., Lauterbach, C, Manocha, D.: g-planner: Real-time motion planning and global 
navigation using GPUs. In: AAAl Conference on Artificial Intelligence, pp. 1245-1251 
(2010) 

24. Pisula, C, Hoff, K., Lin, M.C., Manocha, D.: Randomized path planning for a rigid 
body based on hardware accelerated voronoi sampling. In: International Workshop on 
Algorithmic Foundation of Robotics, pp. 279-292 (2000) 

25. Plaku, E., Bekris, K.E., Kavraki, L.E.: Oops for motion planning: An online open-source 
programming system. In: International Conference on Robotics and Automation, pp. 
3711-3716 (2007) 

26. Sud, A., Andersen, E., Curtis, S., Lin, M., Manocha, D.: Real-time path planning for 
virtual agents in dynamic environments. IEEE Virtual Reality, 91-98 (2007) 

27. Talamadupula, K., Benton, J., Schermerhorn, P.: Integrating a closed world planner with 
an open world. In: ICAPS Workshop on Bridging the Gap Between Task and Motion 
Planning (2009) 

28. Tang, M., Manocha, D., Tong, R.: Meed: Multi-core collision detection between de- 
formable models. Graphical Models 72(2), 7-23 (2010) 

29. Zhang, L., Manocha, D.: A retraction-based RRT planner. In: International Conference 
on Robotics and Automation, pp. 3743-3750 (2008) 



CCQ: Efficient Local Planning Using 
Connection Collision Query 

Min Tang, Young J. Kim, and Dinesh Manocha 



Abstract. We introduce a novel proximity query, called connection collision query 
(CCQ), and use it for efficient and exact local planning in sampling-based motion 
planners. Given two collision-free configurations, CCQ checks whether these con- 
figurations can be connected by a given continuous path that either lies completely 
in the free space or penetrates any obstacle by at most e, a given threshold. Our 
approach is general, robust, and can handle different continuous path formulations. 
We have integrated the CCQ algorithm with sampling-based motion planners and 
can perform reliable local planning queries with little performance degradation, as 
compared to prior methods. Moreover, the CCQ-based exact local planner is about 
an order of magnitude faster than prior exact local planning algorithms. 

1 Introduction 

Planning a collision-free path for a robot amongst obstacles is an important problem 
in robotics, CAD/CAM, computer animation and bioinformatics. This problem is 
well studied and many approaches have been proposed. Over the last few decades, 
sampling-based motion planners such as probabilistic roadmaps ifTOl (PRMs) or 
rapidly-exploring random trees [15| (RRTs) have been shown to be successful in 
terms of solving challenging problems with high degrees-of-freedom (DoFs) robots. 
These planners attempt to capture the topology of the free space by generating 
random configurations and connecting nearby configurations using local planning 
algorithms. 

The main goal of a local planner is to check whether there exists a collision-free 
path between two free configurations. It is important that the local planner should 
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be reliable and does not miss any collisions with the obstacles (8l[Tl. Moreover, a 
significant fraction of the overall running time of a sampling-based planner is spent 
in the local planning routines. 

The simplest local planning algorithms compute a continuous interpolating path 
between free configurations and check the path for collisions with any obstacles. 
These algorithms sample the continuous path at a fixed resolution and discretely 
check each of the resulting configurations for collisions. These fixed-resolution lo- 
cal planning algorithms are simple to implement, but suffer from two kinds of 
problems: 

1 . Collision-miss: It is possible for the planner to miss a collision due to insufficient 
sampling. This can happen in narrow passages or when the path lies close to 
the obstacle boundary, or when dealing with high DOF articulated models. This 
affects overall accuracy of the planner. 

2. Collision-resolution: Most planners tend to be conservative and generate a very 
high number of samples, which results in a lot of discrete collision queries and 
affects the running time of the planner. 

Overall, it is hard to compute the optimal resolution parameter that is both fast and 
can guarantee collision-free motion. In order to overcome these problems, some 
local planners use exact methods such as continuous collision detection (CCD) 
ifTSllSOl or dynamic collision checking ll23l . However, these exact local planning 
methods are regarded as expensive and are much slower than fixed-resolution local 
planners 1231 . Many well-known implementations of sampling-based planners such 
as OOPSMIU and MSlH only use fixed-resolution local planning, though MPK0 
performs exact collision checking for local planning. 

Main Results. We introduce a novel proximity query, namely connection col- 
lision query (CCQ), for fast and exact local planning in sampling-based motion 
planners. At a high level, our CCQ algorithm can report two types of proximity 
results: 

• Boolean CCQ query: Given two collision-free configurations of a moving robot 
in the configuration space, CCQ checks whether the configurations can be con- 
nected by a given path that lies in the free space, namely Boolean CCQs query. In 
addition, the CCQ query can also check whether the path lies partially inside the 
obstacle region (C-obstacle) with at most e-penetration, namely Boolean CCQp 
query. In this case, the robot may overlap with some obstacles and the extent of 
penetration is bounded above by e. 

• Time of violation (ToV) query: If the Boolean queries report FALSE {i.e. the 
path does not exist), the CCQ query reports the first parameter or the configura- 
tion along the continuous path that violates these path constraints. 

Moreover, our algorithm can easily check different types of continuous paths in- 
cluding a linear interpolating motion in the configuration space or a screw motion. 



' |http : / /www. kavrakilab . org/OOPSMP| 

^ihttp: //msl . cs .uiuc . edu/J 

^ |http: //robotics . Stanford. edu/~mitul/mpk/| 
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We have integrated our CCQ algorithm into well-known sampling-based motion 
planners and compared their performance with prior methods. In practice, we ob- 
serve that an exact local planning algorithm based on the CCQ query can be at most 
two times slower than a fixed-resolution local planning based on PRM and RRT, 
though the paths computed using CCQ queries are guaranteed to be collision-free. 
Finally, we also show that our CCQ algorithm outperforms prior exact local planners 
by one order of magnitude. 

Paper Organization. The rest of this paper is organized as follows. In Sec.|2l we 
briefly survey the related work and formulate the CCQ problem in Sec. [3] Sections|4] 
and|3]describe the CCQ algorithms for rigid robots with separation and penetration 
constraints, respectively. We describe how our CCQ algorithm can be extended to 
articulated robots in SecjSl and highlight the results for different benchmarks in 
SecH 

2 Previous Work 

Our CCQ algorithm is related to continuous collision detection. In this section, we 
give a brief survey on these proximity queries and local planning. 

2.1 Continuous Collision Detection 

The term continuous collision detection was first introduced by Redon et al. IfTSl in 
the context of rigid body dynamics, even though the earlier work on similar prob- 
lems dates back to the late 1980s |3|. The main focus of CCD algorithms lies in find- 
ing the first time of contact for a fast moving object between two discrete collision- 
free configurations. Many CCD algorithms for rigid models have been proposed 
1241 : these include algebraic equation solvers, swept volume formulations, adaptive 
bisection approach, kinetic data structures approach, Minkowski sum-based formu- 
lations and conservative advancement (CA). 

For articulated models, Redon et al. lfT9l present a method based on continuous 
OBB-tree test, and Zhang et al. |[30j have extended the CA method to articulated 
models. In the context of motion planning, Schwarzer et al. Il23l present a dynamic 
collision checking algorithm to guarantee a collision-free motion between two con- 
figurations. These algorithms have been mainly used for rigid body dynamics and 
their application to sampling-based planning has been limited ||23]| . In practice, the 
performance of these exact local planning methods is considered rather slow for 
motion planners. Moreover, current CCD algorithms are not optimized for report- 
ing Boolean results and cannot handle penetration queries such as CCQp, that are 
useful for local planners and narrow passages. 

2.2 Local Planning 

There are two important issues related to our work in terms of local planning: 
the type of continuous interpolating path and the validity of the path in terms of 
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collisions. The former is related to motion interpolation between collision-free sam- 
ples, and the latter is related to collision checking. 

2.2.1 Motion Interpolation 

In the context of local planning, different types of motion interpolation methods 
have been used such as linear motion in C-space 1231 . spherical motion in C-space 
ifTTIl . screw motion ll20ll . etc. These motion trajectories are rather simple to compute 
and cost-effective for local planning. 

More sophisticated motion interpolation techniques have been introduced to find 
an effective local path by taking into account the robot/obstacle contacts ||9l [51, 
variational schemes ll25l and distance constraints ll27l . Amato et al. Ul evaluate 
different distance metrics and local planners, and show that the translational distance 
becomes more important than the rotational distance in cluttered scenes. 

2.2.2 Collision Checking 

Given a path connecting two collision-free configurations, a conventional way of 
local planning is to sample the path at discrete intervals and perform static collision 
detection along the discrete path llT3l[T4l . Some exact collision checking methods 
have been proposed for local planning such as Il23l l4ll using adaptive bisection. 

Since collision checking can take more than 90% of the total running time in 
sampling-based planners, lazy collision, lazy collision evaluation techniques have 
been proposed ll22l |2l to improve the overall performance of a planner. The main 
idea is to defer collision evaluation along the path until it is absolutely necessary. 
While these techniques help to greatly improve the performance of PRM-like algo- 
rithms, but they do not improve the reliability of resolution-based collision checkers. 

When narrow passages are present in the configuration space, it is hard to capture 
the connectivity of the free space by using simple collision checking, since it may 
report a lot of invalid local paths. However, some retraction-based planners fT] 
m l28l allow slight penetration into the obstacle region based on penetration depth 
computation, which makes the local planning more effective. 



3 Problem Formulation 

We start this section by introducing the notation that is used throughout the paper. 
Next, we give a precise formulation of CCQ. 

3. 1 Notations and Assumptions 

We use bold-faced letters to denote vector quantities {e.g. o). Many other symbols 
used in the paper are given in Table[T] We assume that both the robot 21 and obstacle 
*8 are rigid and defined in M? workspace. Moreover, the robot has 6 DoFs and the 
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Table 1 Notations. 



Notation 


Meaning 


2l,<8,a2l,^<8 


robot, obstacle and their boundaries 


■^ 


C-space of 21 


q,qW 


a sample in C-space and a ID curve in C-space 


a(q), 21(f) 


placements of the robot 21 at q and q(?) 


^,G 


C-free and C-obstacle region in "^ {i.e. ^ = ^ U ^) 


Ih-ll 


Euclidean distance operator 



obstacle is fixed in space; thus, the C-space of 21 is SE(3). We briefly discuss how 
to handle high DoF robots later in Sec.|6l 



3.2 Local Planning in Sampling-Based Motion Planner 

Given the starting qo and goal qi configurations in ,^, most sampling-based ran- 
domized planners compute a search graph ^ to explore the C-space, where the ver- 
tex corresponds to a sample in jF and each edge corresponds to a ID curve in 
C-space connecting two collision-free samples. More specifically, sampling-based 
planners work in the following manner: 

1. Sample Generation: Sample a collision-free configuration qi in ^ . 

2. Local Planning: Check whether qi can be connected to a vertex qo in ^ by some 
collision-free, continuous path q(r) in C-space. If so, a new edge connecting 
qo, qi is created and added to ^ along with the vertex qi . 

3. Graph Search: Perform graph search on ^ to find a path from qo to qi . If such 
a path is found, the algorithm reports the path; otherwise, go back to step 1 and 
repeat. 

In the local planning step, the choice of a continuous path q(r) interpolating qo,qi 
may vary depending on the topology of J^ . Once a specific path formulation is 
chosen, the algorithm needs to check whether that path is collision-free or not. 

3.3 Connection Collision Query 

Now we define the CCQ proximity query, the main problem to solve in this pa- 
per. Let us assume that two collision-free samples qo,qi G ."^ in ^ and a time- 
parameterized, continuous \D curve q(r) in ^ connecting qo and qi for t G [0, 1]; 
i.e. q(0) = qo,q(l) = qi. Then, the CCQ with separation constraint is formally de- 
fined as checking whether the following predicate CCQ.s is TRUE: 

CCQ, : Vr e [0, 1] => q(r) e JT. (1) 

Moreover, if CCQ, is FALSE, we want to determine the maximum value of t that 
satisfies CCQ,. We call such t as the time of violation (ToV) with separation, T,. 
More formally. 
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T, = max{V.se [0,f] |q(5) e^}. (2) 

The Boolean CCQ^ query is useful for local planning in PRM and RRT, and the ToV 
CCQs query can be used for local planning or the expansion step in RRT. 

On the other hand, the notion of CCQ with e-penetration is a less restrictive 
version of connection query than CCQ with separation constraint, as it allows slight 
penetration (quantified by e) into the C-obstacle region for the C-space curve q(f). 
Formally, we define CCQ with e-penetration as checking whether the following 
predicate CCQp is TRUE: 

CCQ,, : Vr e [0, 1] => {q(r) e JT} V 
{q(r) e ^ A Vp e a(0 nQ5, ||p-5»|| < e}. (3) 

Furthermore, if CCQp is F AL S E , we also determine the maximum value of / that sat- 
isfies CCQp, called the ToV with e-penetration, Tp. More formally, Tp is defined as: 

Tp = max{ \fs e [0,t] I q{s) e .^ V 
{q{s)e& A Vpe2l(5)nQ3,|lp-a<B|| <e}}. (4) 

The CCQp query can be used for PRM and RRT when a small amount of penetration 
is allowed for a robot along the local path. Moreover, retraction-based planners may 
use CCQp to generate samples with slight penetration QISllll. 



4 CCQ with Separation Constraint 

In this section, we present our algorithm to perform the CCQ^ query. We start this 
section by explaining the conservative advancement (CA) technique upon which 
our CCQ algorithm is based. Next, we explain the procedure to compute the ToV 
information T, in Eq|2] along with CCQ^. Finally, we provide a fast technique to 
solve the Boolean version of CCQs ('■£■ EqlH). 



4.1 Conservative Advancement 

Our CCQ algorithm is based on the conservative advancement (CA) algorithm lfT6l 
for convex objects undergoing continuous motion. In CA, the time of contact (ToC) 
T between two convex objects 21 and *B is obtained by iteratively advancing 21 by 
Ats toward 5B without generating collisions. Here, At^ can be calculated by: 

At, < MM (5) 

where jJL is the bound of motion of 2l(?) for t £ [0, 1] projected onto the closest 
direction from 2l(/) to 05, known as the directional motion bound ll29l . Then, the 
ToC is obtained as: 
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T = Y^Ati (6) 

where Atl denotes the rth CA iteration. The iteration continues until ||21(t), Q5|| « 0. 
This idea can be extended to non-convex models using bounding volume hierar- 
chies ||24| . 



4.2 Time of Violation Query for CCQs 

In case of CCQ,, the time of violation (ToV) is equivalent to the time of contact 
(ToC) in CA. Moreover, if the path q(f) is a linear motion in C-space, one can 
employ the C'^A algorithm ll24l based on CA to compute T^ for the robot 21. We 
also show that we can devise a variant of C^A algorithm that can handle the screw 
motion for q(r). 

The screw motion consists of rotation about an axis CO in space by an angle of 6 
radians, followed by translation along the same axis by an amount of d as shown 
in FiglHThe screw motion can be represented by using four parameters {co, 6, a, d), 
where a is any point on the axis co. Given two configurations qo and qi in SE(3), 
the screw parameters can be easily computed [211 . 

The main challenge in computing T, under screw motion is to compute the direc- 
tional motion bound ;U for EqlS] Let us assume that our robot 21 is convex with the 
origin o^ of the body attached frame. Let p be any point on 21 with p'' representing 
the same point but defined with respect to the body frame, n be the closest direc- 
tion from 21 to the obstacle *B at / = 0, px be a vector projected from p to the axis 
CO. Then, an upper bound jj. of the motion of any point on 2t under screw motion, 
projected onto n is: 

jj. =max ( J {p{t)-n)dt 
pea Vo 

max ( f {{\ + CO X p±{t)) ■ a) dt 

(7) 



pG2l 



< max (v -11,0) + II CO x n|| max / ||px (f)||c/f 

\PG2l Vb 

< max Wco -11,0) + ||a) x n|| | ||(o^-a) x a)|| +max||p'-'| 

\ ii\ / II pg2^ III- I 




Fig. 1 Screw Motion. 
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Note that max lip'' 1 1 can be calculated as preprocess, since p* is defined with respect 

pea " " 

to the body frame. A similar bound can be obtained for other motion trajectories 
such as spherical motions IfTTl . 

4.3 Boolean Version ofCCQs 

From the previous section, the CCQs predicate in Eq[T|can be trivially determined 
by checking whether T, > 1 (TRUE) or not (FALSE). However, one can devise a 
more efficient way to answer the CCQ., predicate without explicitly computing Tj. 

Given the starting qo and goal qi configurations, the main idea in evaluating 
CCQs is to perform dual advancements from both end-configurations qo,qi with 
opposite velocities, and iterate this process until collision is found or the path turns 
out to be collision-free. The dual advancement is more effective than the normal 
advancement using a single end-configuration since the normal advancement is al- 
ways conservative (i.e. collision will be never identified until the final ToV value is 
obtained). 

More specifically, as shown in Fig.|2l we perform a single CA iteration from qo 
towards qi as before and compute the first advancement time, At^ . Similarly, we 
perform another CA iteration but from qi towards qo with a negative velocity (e.g. 
(—V, —ft))) and compute the advancement time, At^ . 



a 






Fig. 2 A Single Step in the Boolean Query. Dual advancements are performed from qo 
towards qi by AIq , and from qi towards qo by At^ . The collision is checked at q( j). 



If {AtQ + At^ ) > 1, then the entire path q(f) is collision-free, thus the predicate 
is returned as TRUE; otherwise, we bisect the time interval at / 1 = %^ and perform 
collision detection at the configuration q(f i ). If collision is detected at q(f i ), CCQ^ 
is reported as FALSE and the procedure is terminated. Otherwise, the same dual CA 
procedure is executed recursively on two sub-paths, {[q(zif(J"),q(/i )], [q(f i ),q(l — 

zi?f)]}. Note that the remaining path segments {[q(0),q(zif(J")], [q(l — ziff ),q(l)]} 
are collision-free because of conservative advance mechanism. This procedure is it- 
erated until the separation condition is satisfied or a collision is detected. We provide 
a pseudo-code for CCQs in Alg[T] 
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Algorithm 1. CCQ, 

Input: initial and goal configurations qo, qi, interpolating motion q(f) 

Output: whether Eq.[T]is TRUE or FALSE 



{Initialize the queue with [q(0),q(l)].} 
while Queue 7^ do 

Pop an element [q(ffl),q(';j)] from the queue; 

n =^; 

if q(f 1 ) is in-collision tlien 

return FALSE; 
end if 

Perform CA from q(fa) with a positive velocity and find the step size At^^; 

Perform CA from q(fi) with a negative velocity and find the step size 4f^: 

if (4f+ + 4f^) < {tb-t„) then 

Push [q{ta+At^),q{ti)] and [q(n ),q(ffo — 4?^)] onto the queue; 

end if 
end while 
return TRUE; 



5 CCQ with Penetration Constraints 

The CCQi algorithm presented in Sec |4] strictly imposes that the interpolating path 
q(r) should lie entirely inside ^. However, this condition is rather restrictive since 
a slight overlap between the robot and the obstacles may be useful in practice and 
is used by retraction-based planners Q ID |28| . For instance, often the curved sur- 
face model of a robot is tessellated with some surface deviation error e and thus e- 
penetration does not necessarily imply actual interference H. The notion of CCQp 
is that we allow slight penetration for a robot along the path as long as the penetra- 
tion amount is less than some threshold, e. 

5.1 Penetration Depth 

To quantify the amount of penetration for a robot 21, we need a suitable metric. The 
penetration depth (PD) is a proper metric to quantify the amount of overlap between 
2t and 05. In the literature, different types of penetration depth are known |26| and 
in our case, we use pointwise penetration depth 1241 since it is computationally 
cheaper to compute as compared to other penetration measures. 

When 21 and *B overlap, the pointwise penetration depth is defined as the point 
of deepest interpenetration of 21 and 03. Formally, the pointwise penetration depth 
(or PD for short) can be defined as: 

pz) = ^(2in5(2lnS),Q5n5(2lnQ5)) (8) 

where J^{-, •) denotes the two-sided Hausdorff distance operator between surfaces. 
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5.2 Boolean Version of CCQp 

We first explain how to evaluate the CCQp predicate in Eq|3] The main idea of our 
evaluation algorithm is to decompose the advancement step size At into two sub- 
steps Ats and Atp (i.e. At = At^ +Atp) such that collision-free motion is generated 
during At^ while Atp may induce penetration with the PD value being less than 
e. Then, we perform dual CAs from the end-configurations qo,qi like CCQs in 
Sec. 1431 

Since At^ can be calculated just like in Eq.[5] computing At boils down to cal- 
culating Atp. In general, computing Atp can be quite challenging since one needs 
to search the entire C-space (both C-free and C-obstacle) where the placement of 21 
at q{t + At) may yield either collision-free or in-collision configuration. In order to 
compute a feasible solution for Atp, we use a conservative approach. 

The key idea is that, after the advancement of Ats +^tp time step, want to guar- 
antee that the robot still remains collision-free at q{t + Ats+Atp). Taking advantage 
of this constraint, we first move the robot to 2t(r + Ats), and then calculate Atp that 
can bound the motion of 21 by less than 2e so that the possible PD between 2t and 
*B can be less than e, as shown in Fig. [3] 




Fig. 3 Decomposition of the Time Step At into Ats and Atp for CCQp. Ats corresponds to 
the collision-free time step and Atp to the time step that may result in e -penetration. 



More precisely, an upper bound of the time step size Atp can be computed by 

observing the fact that the robot should not travel by more than 2e; otherwise, the 

penetration depth can be greater than e. Thus, assuming that the robot and obstacles 

are both convex, we have: , 2s 

^tp s — {y) 

Mm 

where ;Ll„ is the maximum amount of motion that a point on 21 can make between 
the time interval of [0, 1]. Note that /i„ is an undirected motion bound unlike the 
directed one /i in EqlSj since no closest direction will be defined for a robot in col- 
lision with obstacles. Essentially, /i„ depends on the underlying path. We present 
simple formulas to compute /i„ for both linear (EgfTOb and screw (EqfTTIl mo- 
tions as shown below. Here, p,p'\p±,o'' have the same meanings as defined in 
SecIO 
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Linear Motion 

jU„ = max( /||p,(r)||fl'r 

PG21 Vo 



= max ( rllv + cox p''(t)\\dt ] 
pea Vo / 

< ||v||+max( r||a)xp''(f)||cfr 
pea Vo 

<||v|| + ||a)||max||p''|| 
pea 



(10) 



Screw Motion 



(11) 



M„=max( /||p,(/)||Jf 
pea Vo 

= max( f\\v + (Oxp±(t)\\dt 
pea Vo 

< ||v||+max( [\\(Oxp^{t)\\dt 

pea Vo 

< h\\ + ll«ll ( ll(o*-a) X coll +max||p* 

V pe2i " 

The result of our algorithm is conservative in the sense that our algorithm does not 
report a false-positive result; i.e. if the algorithm reports TRUE, it guarantees that 
CCQp is indeed TRUE. 



5.3 Time of Violation Query for CCQp 

A simple way to compute the ToV in Eq|4] can be devised similarly to evaluating 
CCQp by decomposing the ToV into the one corresponding to collision-free motion 
Ts (EqlJl) and one to e-penetration At' : i.e. 

Tpi = i ^Ati J +Atp, = T, + Atp,. (12) 

Moreover, in order to guarantee e-penetration. At' is calculated such that the motion 
of 21 starting att ~ T, should be bounded above by e: 

At'p < ^. (13) 

Here, the undirected motion bound M» can be calculated similarly as in the previous 
section. However, there are two issues related to computing the ToV, as shown in 
Eqini 

• Tpi provides a lower bound of the ToV with e-penetration, but this may be a loose 
bound since e is typically much smaller than fj.,,. 
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• The placement of the robot at 2l(Tpi) may correspond to an in-colhsion sample. 
This can be problematic for most sampling-based planners where only collision- 
free samples are permitted to represent the connectivity of the free C-space. 

Note that the second issue is more severe than the first one in practice. We introduce 
an alternative way to compute Tp to overcome these issues. 

The main idea is that instead of accumulating the collision-free time steps first 
(i.e. Ts), we intertwine collision-free and in-collision motions for every time step, 
just like the Boolean query in the previous section. Thus, the new ToV Tp2 is: 

T„2=^{Ati + At;,). (14) 

Here, At!., At' for the rth iteration are calculated using Eq.|5]and Eq.|9l respectively. 
The above iteration continues until the rth iteration yields a collision. Thus, by con- 
struction, 2l(Tp2) is collision-free. Moreover, in general, Tpi < Tp2', however this is 
not always true but less likely to happen in practice since Eq.[T4]continues to iterate 
until collision is found unlike Eq.[l2l as illustrated in Fig ID 



21 





H 



Fig. 4 Comparison between Tpi and Ty,2- In general, Tp2 > Tpi since more iterations will be 
performed for Tp2 until collision is found at q(Tp2)- 



6 Extension to Articulated Robots 

Our CCQ algorithms for rigid robots can be extended to articulated robots. The basic 
equations that support CCQ algorithms such as Eqs.|6]or[T2]can be reused as long 
as the directed and undirected motion bounds /i , ;U„ can be calculated. However, this 
turns out to be relatively straightforward. For instance, in the case of linear motion, 
the directed motion bound ji for an articulated robot can be obtained using the same 
motion bound presented by Zhang et al. 1301. Moreover, the spatial and temporal 
culling techniques proposed in the paper to accelerate the query performance are 
also reusable for CCQ queries between articulated models. 



7 Results and Discussion 

In this section, we describe the implementation results of our CCQ algorithms, and 
benchmark the performance of the algorithms by plugging them into well-known, 
sampling-based planners. Finally, we compare our algorithm against prior exact lo- 
cal planning techniques. 
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7.1 Implementation Details 

We have implemented our CCQ algorithm using C++ on a PC running Windows 
Vista, equipped with Intel Dual CPU 2.40GHz and 5GB main memory. We have ex- 
tended public-domain collision libraries such as PQP lfT2l and C^A. Note that these 
collision libraries are designed only for static proximity computation or ToV com- 
putation (similar to Xs) under a linear motion. Throughout the experiments reported 
in the paper, we set the penetration threshold e for CCQp and T^ as one tenth of the 
radius of the smallest enclosing sphere of 21. 

To measure the performance of our algorithms, we have used the benchmarking 
models and planning scenarios as shown in Tableland Fig|5]with sampling-based 
motion planners including PRM and RRT. These benchmarking models consist of 
IK ^ 30K triangles, and the test scenarios have narrow passages for the solution 
path. Typical query time for our CCQ algorithms takes a few milli- seconds; for 
instance, the most complicated benchmark, the car seat, takes 21.2 msec and 28.3 
msec for ToV and Boolean queries, respectively. 




(a) Maze 



(b) Alpha Puzzle 



(c) Car Seat 



(d) Pipe 



Fig. 5 Benchmarking Scenes. For each benchmark scene, the starting and goal configura- 
tions of the robot are colored in red and blue, respectively. 



Table 2 Benchmarking Model Complexities. 



Benchmarks 


21 


03 


#oftri(2l) 


#oftri(Q3) 


Maze 


CAD piece 


Maze 


2572 


922 


Alpha Puzzle 


Alpha 


Alpha 


1008 


1008 


Car seat 


Seat 


Car Body 


15197 


30790 


Pipe 


Pipe 


Machinery 


10352 


38146 



7.2 Probabilistic Roadmap with CCQ 



In Sec l3.2l we have explained the basic steps of sampling-based planners. These 
planners use a different local planning step (the step 2 in Sec. 13.21 ). 

In conventional PRM-based planners, this Boolean checking is implemented 
by performing fixed-resolution collision detection along the path, namely fixed- 
resolution local planning (DCD). In Table [3] we show the performance of PRM 
with DCD with varying resolution parameters and a linear path. Here, the resolution 
parameter means the average number of collision checks performed for each local 
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Table 3 The performance of PRM in seconds based on fixed-resolution local planning (DCD) 
with different resolutions for the maze benchmark. 



Avg. Collision Resolution 


23 


40 


47 


80 


128 


PRM with DCD (Boolean) 12.70s 15.88s 


18.76s 39.49s 


44.75s 



path. We have used the OOPSMP implementation of PRM, and only the maze and 
pipe benchmarks were solvable by OOPSMP within a reasonable amount of time. 
The optimal performance is obtained when the resolution is 23, and as the resolu- 
tion parameter becomes less than 23, the OOPSMP may not be able to compute a 
collision-free path. In any case, the DCD local planner still does not guarantee the 
correctness of the path in terms of collision-free motion. 

However, exact local planning is made possible by running the Boolean version 
of our CCQ algorithm on the path. In Table |4l we highlight the performance of 
CCQ-based local planning algorithms (CCQi and CCQp) with PRM, and compare 
it against that of the DCD local planning method with the optimal resolution param- 
eter. In case of the pipe benchmark, the PRM performance using our algorithm is 
similar to that of the DCD. In case of the maze benchmark, our CCQ-based local 
planner is about 1.8 times slower than DCD local planner. Even for this benchmark, 
when the resolution parameter becomes higher than 80, our CCQ algorithm per- 
forms faster than DCD, even though the DCD local planner still cannot guarantee 
the correctness of the solution path. Also notice that CCQ,, takes less time than 
CCQi since the former is a less restrictive query than the latter. 

Table 4 The performance of PRM using DCD local planner and CCQ-based local planner. 
The CCQ-based local planner can guarantee collision-free motion while the other cannot give 
such guarantees. 



Benchmark 


DCD 


Boolean Query 


CCQ, 


CCQ,, 


Maze 


12.70s 


36.34s 


24.09s 


Pipe 


8425.09s 


9610.13s 


8535.60s 



7.3 Rapidly-Exploring Random Tree with CCQ 

Both ToV and Boolean CCQ can be employed to implement exact local planning 
for RRT planer. Specifically, when the new node is to be extended along some path, 
if the path is not collision-free, the path can be entirely abandoned (Boolean query) 
or the partial collision-free segment of the path before the ToV can be still kept 
(ToV query). 

In Fig. [6l we show the performance of RRT planner with our CCQ algorithms 
and DCD local planner with the optimal resolution parameter. Also, different types 
of motion paths such as linear and screw motion have been tested. We also have 
used the OOPSMP implementation of RRT for this experiment. To find the optimal 
resolution parameter for DCD local planner, we test different resolution parameters 
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Table 5 The performance of RRT in seconds based on fixed-resolution local planning (DCD) 
with different average resolutions for the alpha puzzle benchmark. In this case, RRT uses the 
ToV query. When the resolution is less than 4, RRT cannot find a path 



Avg. Collision Resolution 


4.21 


5.96 


6.01 


6.97 


RRT with DCD (ToV) 


25.60s 0.25s 


2.08s 


39.65s 



■ DCD 

■ CCQs 

■ CCQp 



Fig. 6 The Performance of RRT using DCD and CCQ-based Local Planner. The x-axis 
represents different benchmarking scenes with different queries such as BL (Boolean query 
with a linear motion), BS (Boolean query with a screw motion), TL (ToV query with a linear 
motion), and TS (ToV query with a screw motion) for each benchmark. The y-axis denotes 
the planning time in seconds for the maze and pipe benchmark, in tens of seconds for the 
alpha puzzle, and in hundreds of seconds for the car seat. The blue, red and green bars denote 
the planning time using DCD, CCQ^ -based, and CCQp -based local planners, respectively. 

ranging between [3, 15]; for instance, see Table |5] for the alpha puzzle benchmark 
using the ToV query based on DCD local planner. Similar to PRM, the variation in 
performance depends on the resolution parameter, but it does not show the linear 
relationship between the resolution and performance unlike PRM since comput- 
ing an accurate ToV using higher resolution requires many more collision checks. 
Thus, picking a right value for the resolution parameter is even more difficult in case 
of RRT. 

In our benchmarks, the RRT with CCQ-based local planner is roughly two times 
slower than the one with DCD local planner with the optimal resolution, which 
defined as the minimum resolution to find a path. However, in some cases such as 
the Maze (BS), Alpha puzzle (BL) and pipe (BL) benchmarks in Fig|6l the RRT with 
CCQ-based local planner is even faster than the one with DCD local planner since 
the number of collision checks can be kept minimal. For the car seat benchmark, the 
Boolean query with a screw motion (BS) could not find out a collision-free path in 
a reasonable amount of time. 



7.4 Comparisons with Prior Approaches 

We also compare the performance of our CCQ-based local planning algorithm with 
the prior exact local planning algorithms such as the dynamic collision check- 
ing method (DCC) 1231 implemented in MPK. To the best of our knowledge, the 
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Table 6 Performance Comparisons between dynamic collision checking (DCC), CCQs and 
C^A -based local planner. The timings are the total collision checking time in seconds used 
for local planning. 



Benchmarks 


# of triangles 


CCQ, 


DCC 


C^A 


Pipe 


48K 


0.29s 


1.82s 


1.78s 


Alpha-shape with two Holes 


IK 


4.5s 


63.8s 


17.9s 



A 



Fig. 7 Alpha-Shape Through Two Holes. The red and blue alpha shapes represent the start- 
ing and goal configurations, respectively. 



dynamic collision checking algorithm is the only public-domain exact local planner 
that has been integrated into sampling-based motion planner. 

Since DCC supports only a Boolean query with a linear motion and separa- 
tion constraints, we compare the performance of the Boolean version of our CCQ^ 
against DCC by plugging CCQ^ into the MPK planner, as shown in Table |6l For 
benchmarks, we use the same pipe model in Fig|5]-(d), but shrink the robot a little 
to enable MPK to find a solution path. We also use another benchmark model as 
shown in Fig. |7l the alpha-shape with two holes. In this case, we plan a path for 
an alpha-shape tunnelling through two holes, and measure the average performance 
of DCC and CCQ^-based local planner. We also compare the CCQv algorithm with 
C^A-based local planning algorithm ll24ll in two benchmarks, as shown in Table. |6l 

In our experiments, CCQs-based local planner is about an order of magnitude 
faster than DCC local planner mainly because CCQ uses a tighter, directional mo- 
tion bound than DCC relying on undirectional motion bound. A similar explanation 
was also provided in 1*2911 why the directional bound is superior to the undirectional 
one. Another reason is because of the dual advancement mechanism in CCQ-based 
local planner. Moreover, CCQs is about 5 times faster than C^A in our experiment, 
because of the dual advancement mechanism. 

The ToV version of our CCQ, algorithm has a similar objective as continuous 
collision detection algorithms. Since our algorithm is based on the known fastest 
CCD algorithm C^A |24|, it shows a similar performance of that of C^A. However, 
C^A is not optimized for a Boolean query and does not support CCQ with pene- 
tration constraints. Ferre and Laumond's work [4] supports a penetration query, but 
their work is not available freely and is essentially similar to DCC 



8 Conclusions 



We have presented a novel proximity query, CCQ, with separation and penetration 
constraints. It can be used for efficient and exact local planning in sampling-based 
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planner. In practice, we have shown that the CCQ-based local planner is only two 
times slower or sometimes even faster than the fixed-resolution local planner. More- 
over, CCQ-based local planners outperform the state-of-the-art exact local planners 
by almost an order of magnitude. Our CCQ algorithm can be also extended to a 
more general type of motion as long as its bound can be conservatively obtained. 

There are a few limitations in our CCQ algorithm. Both CCQ^ and CCQp algo- 
rithms are sensitive to threshold values; e.g. the termination condition threshold for 
CA or CCQi and penetration threshold e for CCQp. The motion bound calculation 
such as jl or /x„ depends on the underlying path. When the robot moves with a very 
high rotational velocity, many CA iterations might be required to converge. 

For future work, it may be possible for a planner to try different types of paths 
and automatically choose the suitable or optimal one. We would like to extend our 
CCQ framework to deformable robots. We are also interested in applying our CCQ 
technique to other applications such as dynamics simulation where the ToV compu- 
tation is required. In particular, the use of CCQp may also provide a direction for 
contact dynamics where slight penetration is allowed (e.g. penalty-based method). 
Finally, we would like to design parallel GPU-based extension of CCQ and use it 
for real-time planning [ 17| . 
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Modeling Contact Friction and Joint Friction in 
Dynamic Robotic Simulation Using the Principle 
of Maximum Dissipation 



Evan Drumwright and Dylan A. Shell 



Abstract. We present a unified treatment for modeling Coulomb and viscous fric- 
tion within multi-rigid body simulation using the principle of maximum dissipation. 
This principle is used to build two different methods — an event-driven impulse- 
based method and a time stepping method — for modeling contact. The same prin- 
ciple is used to effect joint friction in articulated mechanisms. Experiments show 
that the contact models are able to be solved faster and more robustly than alterna- 
tive models. Experiments on the joint friction model show that it is as accurate as a 
standard model while permitting much larger simulation step sizes to be employed. 



1 Introduction 

Rigid body dynamics is used extensively within robotics in order to simulate robots, 
learn optimal controls, and develop inverse dynamics controllers. The forward dy- 
namics of rigid bodies in free space has been well understood for some time, and the 
recent advent of differential algebraic equation (DAE) based methods has made the 
dynamics of bodies in contact straightforward to compute as well. However, fast and 
stable robotic simulation remains somewhat elusive. The numerical algorithms used 
to compute contact forces run (on average) in time 0{rr') in the number of contact 
points and are numerically brittle. In this paper, we present a class of methods for 
modeling contact with friction in multi-rigid body simulation that is not only faster 
empirically than existing methods, but is solvable with numerical robustness. We 
also extend our approach to modeling joint friction and show how it is at least as 
accurate as standard joint friction models, while permitting much larger simulation 
step sizes (and thus much greater simulation speed). 
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Our approach centers around the principle of maximal dissipation. Paraphras- 



ing 114411 ■ the principle of maximal dissipation states that, for bodies in contact, the 
friction force is the one force (of all possible friction forces) that maximizes the 
rate of energy dissipation. In this paper, we show that we can use the principle of 
maximal dissipation to implicitly solve for frictional forces, in contrast to prior ap- 
proaches that set the frictional force by explicitly using the direction of relative mo- 
tion. By employing this strategy, we are able to dispense with the complementarity 
constraints that are nearly universally employed and instead formulate the contact 
problem using a convex optimization model. 



2 Methods for Modeling Contact with Friction in Multi-rigid 
Body Simulation 

2.1 Background 

As stated in the previous section, modeling contact with friction in multi-rigid body 
simulation has been extensively conducted using complementarity constraints. Such 
constraints, which take the form a > 0,b > 0,a^b = 0, have been utilized in multi- 
rigid body simulation to ensure that forces are not applied at contact points at which 
bodies are separating |14|], that either sticking or sliding friction is applied 141], and 
that forces are applied only for joints at their limits 12811 . A non-exhaustive survey 
of the literature dedicated to modeling contact with complementarity constraints in- 
cludes SIllllilslHIIlSllliillllsiliSlilllJils^ 



49l 14811 . Initial efforts on modeling contact with friction 1261 l27l |9[] attempted to 
solve a linear complementarity problem (LCP) for the unknown forces and acceler- 
ations. Later work — under which researchers realized that solving for forces and ac- 
celerations could be subject to inconsistent configuration^ psl] — solved instead for 
unknown impulsive forces and velocities; this approach was able to avoid the prob- 
lem of inconsistent configurations. Both iterative sequential impulse schemes (e.g., 
|30l|22[]) and simultaneous impulse-based methods (e.g., 14 14511 ) were employed. 



The former methods have been viewed as splitting methods 11511 for solving linear 
complementarity problems by Lacoursiere, who reported slow convergence rates for 
coupled problems using such approaches [24]. Correspondingly, much of the multi 
rigid-body simulation community currently uses explicit linear or nonlinear com- 
plementarity problem formulations (Anitescu and Potra j^, most prominently) with 
impulses and velocities in order to model contact with friction. 

Fig. [U illustrates why complementarity conditions are necessary at the accelera- 
tion level; further discussion of their requirement at the acceleration level is present 
in ||9[] and lll4ll . Given that using impulsive forces is necessary to avoid the problem 



of inconsistent configurations (exemplified by the problem of 13611 ), it must be dis- 
cerned whether the complementarity conditions are accurate and appropriate in that 
new context (i.e., at the velocity level). We stress that the context is indeed new. 



' An inconsistent configuration is a contact configuration that has no solution using non- 
impulsive forces. 
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Fig. 1 A figure (taken from lilOjl) illustrating the necessity of complementarity constraints 
for resting (non-impacting, non-separating) contact at the acceleration level. Complementar- 
ity constraints would ensure that body A does not move upward unnaturally fast if a force 
were applied to A- say by a strong wind moving between B and A~ accelerating A upward. 
In the absence of such a force, the complementarity condition keeps blocks A and B from 
interpenetrating. 

because only "resting" (zero relative normal velocity) contacts are treated with 
forces at the acceleration levejj while both resting and impacting contact are treated 
with impulses at the velocity level. 

With respect to accuracy, il4ll argues that the complementarity conditions do 
not necessarily reflect reality for impacting contacts. He conducts an experiment 
using carbon paper, a coin, and a hammer, that serves to prove his argument: the 
physical accuracy of complementarity-based models, at least in some scenarios, is 
lacking. Further physical experimentation is warranted, especially given the dom- 
inance of complementarity-constrained-methods at the velocity level in the litera- 
ture, but if we accept that nature does not require complementarity constraints, do 
such conditions lead to models that are more advantageous in some other way {e.g., 
computationally?) 

Linear and nonlinear complementarity problem-based models are, in fact, inferior 
with respect to computation, at least relative to the models introduced in this paper. 
Solutions to bisymmetric LCPs — the form that many models take — are NP-hard in 
the worst case in the number of contact points (though the expected time solution is 
O(n^) [150). The contact models are non-convex, so only a few algorithms are capa- 
ble of solving LCP-based contact models; codes for solving the NCP-based models 
are even more rare^ The LCP-based models linearize the friction cone, and the 



^ Unless a penalty method is used; such methods are irrelevant to this discussion because 
we are assuming non-interpenetrating contact. 

The reader may question the desire to have multiple algorithms capable of solving a contact 
model. Optimization algorithms can be numerically brittle, so using multiple algorithms 
can reduce the likelihood of a simulation failing to progress. 
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fidelity of the friction cone approximation increases the size and, correspondingly, 
the computational cost of the model to be solved. 

Given that complementarity conditions may not be physically warranted and that 
such models can be hard to solve computationally, what is the reason behind their 
strong popularity? We believe this quote from |il4] is instructive: 

It is emphasized that many authors in this area are aware of the possible phys- 
ical inaccuracies behind the complementarity assumption. For example, 1320 
states that a primary benefit of such a formulation is internal mathematical 
consistency, and empirical corrections towards better accuracy can be accom- 
modated later. Baraff (personal communication) mentions that there is no rea- 
son to think that real systems obey the complementarity conditions. However, 
in Pfeiffer and docker's 13811 discussion of the "corner law of contact dynam- 
ics" there is unfortunately no explicit mention of the possible lack of physical 
realism behind the complementarity conditions in the presence of impacts. 
i^ focus on mathematical aspects of their formulation, and also omit discus- 
sion of physical realism. The authors of the latter two authoritative works may 
therefore unintentionally convey an inaccurate impression to a reader who is 
new to the field. 

2.2 Two Representations for the Contact Models 

Given that complementarity conditions are not a necessary feature of contact mod- 
els, we now proceed to present our complementarity-free contact models. We will 
use two representations for these contact models. Both representations have been 
employed previously in the literature. We utilize the two representations here in or- 
der to make the community aware of their existence, to unify them, and, hopefully, 
to expose them to further study (for determination of computational or numerical 
advantages, for example). 

A / b representation 

The first representation formulates the contact model in terms of matrices A and b 
(alternatively named K and u by ISOfl). This representation has been employed by 
iSD, IIBOn . i23n . and lll7n . In this representation, b is a vector of relative velocities in 



the 3M-dimensional contact space and A is the 3m x 3n sized contact space inertia 
matrix that transforms impulsive forces to relative velocities in contact space. A is 
dense, symmetric, and positive-semi definite; the latter two properties were proven 
by 130(1 . The matrices A and b are related by the equation: 

b+=Ax + b (1) 

where b is the vector of relative velocities pre-contact (external forces, such as grav- 
ity, are integrated into this vector), x is the vector of impulses applied in the contact 
space, and b+ is the vector of relative velocities after impulses are applied. 

The matrices A and b can be determined formulaically if the contacting bodies 



are not articulated, and via application of test impulses otherwise (c/, 11301 1231] '). By 
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using test impulses, the contact model can remain ignorant of whether the bodies are 
articulated and, if articulated, of whether the bodies are formulated using maximal 
or reduced coordinates. This is an advantage of the A / b representation; this rep- 
resentation also tends to produce simpler — though not necessarily computationally 
advantageous — objective and constraint function gradients and Hessians for solving 
the contact model. 

Generalized coordinate representation 

The generalized coordinate representation has been used in work by 141] and 14511 . 



among others. This representation uses matrices M (the generalized inertia matrix), 
N (the contact normal Jacobian), J (the joint constraint Jacobian); vectors c„ (the 
contact normal impulse magnitudes), Cy (the joint constraint impulse magnitudes), 
q (the generalized coordinates), v (the generalized velocities) and k (the generalized 
external forces), and scalar h (the step size). 

For models with complementarity constraints, additional matrices are used, both 
to enforce the complementarity constraints and to provide a linearized friction cone; 
we do not list such matrices here. Because our model does not employ comple- 
mentarity constraints, it is able to provide a true friction cone and still remain com- 
putationally tractable. The true friction cone is obtained using Jacobian matrices 
corresponding to the two tangential directions at each contact normal; we denote 
these matrices S and T and the corresponding contact tangent impulses as c, and c,. 

All of the matrices described above are related using the following formulae: 



V 



= M-i {N'^Cn + S"rc, + T^c, + j'^Cj + hk) + v' (2) 

q'+i =q' + hv'+^ (3) 

The second equation reflects that this representation is typically utilized in a semi- 
implicit integration scheme. 

Note that M is symmetric, positive-definite. M, N, S, T and J are sparse and 
correspondingly make efficient determination of the objective and constraint gradients 
and Hessians conceptually more involved than with the A /b representation. Computing 
M, N, S, T and J is quite simple, however, and computationally far more efficient than 
the test impulse method. We show below, however, that A and b can also be determined 
efficiently using the matrices from the generalized coordinate representation. 

Unification of the A / b and generalized coordinate representations 

We may write b and A using the generalized coordinate representation as 



(v' + hM-^k), 



C[NTsTtT] 



where C = M'^ - M-^r{JM-^D ^JM^i. 



These equations show that recovering N, S, T, and J from A and b is not possible. 
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2.3 The Contact Models for Event-Driven and Time-Stepping 
Simulation 



Computer-based rigid body simulation methods have been categorized into three 
schemes by Broghato et al. 11211 : penalty, event-driven, and time- stepping. Given 
that penalty methods necessarily cannot enforce non-interpenetration, we focus in- 
stead on event-driven and time-stepping methods. The former are able to utilize 
arbitrary integration schemes (time- stepping methods are frequently restricted to 
semi-implicit Euler integration) while the latter aim to be able to avoid (possibly 
nonexistent) accuracy problems due to continually restarting the integration process; 
1 121] provide greater detail of the rationale behind development of the time-stepping 
approaches. We note that 1161] has shown that one purported advantage of time- 
stepping approaches over event-driven simulations — avoidance of Zeno points — is 
nonexistent. Sec. 12.41 and 12. SI present both event-driven and time-stepping methods 
in order to show that our method is applicable to both. Sec. l2.4l uses the A/b repre- 
sentation, while Sec. l2.5l uses the generalized coordinate representation. The choice 
of representation used for the individual methods is arbitrary: the two representa- 
tions are interchangeable. 



2.4 Event-Driven Impulse-Based Method 



l30ll defines the work done by collision impulses using the A/b representation as 
5X^(Ax + 2b). We can use the principle of maximal dissipation to determine the set 
of impulses that maximally dissipate kinetic energy. In particular, we can formulate 
and solve the following optimization problem. 



Quadratic Program 1 


Minimize x^ ( Ax + 2b ) 






Subject to: 






[An B]x + bN>0 




(Noninterpenetration constraint) 


XN>0 




(Compressive force constraint) 


K > 1"''xa, 




(Friction cone constraint) 


^^x\+li^{\>\.+h\) > x^^^+xl^ 


\fie \...n. 


(Coulomb/viscous 
friction constraint) 



where A 



Aw B 

RT At 



is partitioned into n x « block A.^, n x 2n block B, and 
2n X 2n block Aj-. Similarly, x and b are partitioned into x = [xa? xj-j xt, ] and 
b = [bAT bj-j br2 ] , respectively. Scalar K is the sum of the normal impulses that 
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describe the minimum kinetic energy solution when the tangential impulses are zero; 
this scalar is described further in Sec.l2.6l 



2.5 Time Stepping Method 

Equivalent in spirit to the event-driven method is the time-stepping method, which, 
like that of ||4|], is a semi-implicit scheme and, like that of [451 . adds contact con- 
straint stabilization to the dynamics equations. 



Quadratic Program 2 


Minimize 


^v'+i^Mv 


+1 




Subject to: 










Nv'+' > 




(Noninterpenetration constraint) 




Jv'+i=0 




(Bilateral joint constraint) 




c„>0 




(Compressive force constraint) 




l^c, < K 




(Friction cone constraint) 


^ihl + 






(Coulomb/viscous 


ti^[Si{y'+hM- 


lk)l2 + 




friction constraint) 


H^[Ti{V + hM 


r\)f>cl+cl 


\/iel...n 





where S,- and T, refer to the ;* row of S and T, respectively. Unlike the event- 
driven method, the time-stepping method includes a constraint for bilateral joints 
in case the method is used with maximal-coordinate formulated articulated bodies; 
the A/b representation implicitly encodes such constraints into the matrix / vector 
formulations. 



2.6 Solving the Models 

Both of the contact models introduced in the previous sections are convex and can 
be solved in polynomial time in the number of contacts using interior-point meth- 
ods. Determining the value k" requires solving the models in an initial, frictionless 
phase (phase I). Next, a frictional phase (phase II) is solved. If normal restitution 
is necessary, a third phase is required as wellQ Fortunately, phase I is a quadratic 
programming model with box constraints, and can thus be solved extremely quickly 
using a gradient projection method iSSH . The model for phase I (A/b formulation) is: 



We omit details of this third phase but refer the reader to [14], which describes a Poisson- 
type restitution model that is applicable to our models also. 
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Quadratic Program 3 



Minimize 
Subject to: 



-xlANXN + xlbN 



xn>0 



(Compressive force constraint) 



Given that Aa^ is symmetric and positive semi-definite (follows from symmetry 
and positive semi-definiteness of A), this model describes a convex linear comple- 
mentarity problem (see BlSll . p. 5). As a result, the constraint A^XN + hN > (which 
is equivalent to our non-interpenetration constraint) is automatically satisfied at the 
optimum. This convex LCP always has a solution and we prove that this solution 
does not increase the energy in the system (see Appendix). As phase II cannot in- 
crease the energy in the system our contact models are energetically consistent. 

Phase I has two objectives: determine an energetically consistent, feasible poinQ 
for the nonlinear Quadratic Program 1 and determine K. The value K is determined 
by calculating the sum I^xat, using the result from phase I. Observe that the friction 
cone inequality constraint in Quadratic Program 1 restricts the sum of the normal 
impulses to that determined in phase I; thus, phase II can reorder, transfer, or re- 
move some normal force, but the friction cone is prevented from becoming enlarged 
arbitrarily (i.e. , the normal forces cannot be increased without bound to increase the 
amount of frictional force applicable) in order to decrease the kinetic energy more 
rapidly. 

We note that, although we use a linear complementarity problem formulation to 
show that our contact models are energetically consistent, our contact models do not 
use complementarity constraints: neither phase I nor phase II of the contact models 
explicitly include any such constraint. 



3 Evaluating the Contact Models 

There does not currently exist a standardized set of benchmark models for evalu- 
ating the performance and accuracy of contact models. In order to evaluate contact 
models, previous approaches have used either physical experimentation on a sin- 
gle benchmark scenario (e.g., 114611 ') or pathological computer-based scenarios (e.g., 
i3Cl . l4d . ll3n ') that are known to exhibit certain qualitative behavior in the real world. 
Like ||4|], we do not provide exhaustive experimentation to indicate the predictive 
abilities of our contact model. Instead, we note that our model possesses the follow- 
ing properties, which are also possessed by leading alternative contact models that 
treat multiple contact points simultaneously (e.g., I:4, :45i1 ): 

1. Positive work is not done by the model (energy is not added to the system). 

2. Interpenetration constraints are not violated. 



^ A feasible point is one which respects all inequality constraints. In the case of our con- 
tact model, a feasible point will respect non-interpenetration, compressive normal force, 
summed normal force, and Coulomb and viscous friction constraints. 
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3. Only compressive normal forces are applied. 

4. The friction cone is not enlarged artificially. 

5. The principle of maximal dissipation is obeyed. 

Given that these are the main characterizations of both systems, we can expect 
the emergent behavior of both the complementarity-based methods and our non- 
complementarity-based methods to be similar. Indeed, as F ig, p shows, the method 
of Anitescu and Potra (solved using Lemke's algorithm 125111511 ) produces identical 
results (to numerical precision) to our non-complementarity-based model on at least 
one scenario. 




Fig. 2 Plot illustrating 
accuracy of a box sliding 
down a rarnp under both the 
method of ||4|] and the convex 
optimization-based method 
introduced in this paper. 
Although the two models are 
formulated quite differently, 
the simulated results are 
identical. 



We do not claim that our method is more accurate than that of Anitescu and Po- 
tra or Stewart and Trinkle (notwithstanding the linearized friction cones generally 
employed by those two methods). The advantage of our method lies in the compu- 
tational domain. Lacking complementarity constraints, our model is solvable faster 
than competing methods; additionally, the gradient projection method we use to 
solve the first phase of our algorithm — recall that the first phase of our algorithm 
finds a point that respects properties (l)-(4) above — never fails to produce a solu- 
tion; failure in phase II of our method will only affect the accuracy of the solution 
and will not lead to interpenetration or positive work (energy gain). 

The experiments below showcase these advantages of our method. 



4 Experiments 

4.1 Event-Driven Example: Internal Combustion Engine 

We constructed an inline, four cylinder internal combustion engine in order to illus- 
trate the ability of our method to treat moderate numbers of contacts (between one 
and two hundred) far faster than complementarity-based models. All joints within 
the simulated engine were realized by contact constraints only. The engine was sim- 
ulated by applying a torque to the crankshaft, which caused the cylinders to move 
upward and downward within the crankcase. The crankcase and crankshaft guides 
are stationary within the simulation — they possess infinite inertia, so they do not 
move dynamically — and the remaining parts are all of mass 1 .Qkg. Although the 
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Fig. 3 Frames in sequence from a simulation of the rigid-body dynamics and interactions 
within an internal combustion engine 



masses do not reflect reality, the moment-of-inertia matrix for each part is calcu- 
lated using its triangle mesh geometry via the method of 1290 . Gravity acts along 
the vertical direction of the simulation at 9.%m/s^ and the engine surfaces are given 
zero coefficients of Coulomb and viscous friction: we initially wanted to judge how 
rapidly the polygonal-based geometric representation causes energy loss. 

For purposes of comparison, we used the method of |4[1 with the simplest {i.e., 
pyramidal) friction model. Using the pyramidal friction model resulted in six LCP 
variables per contact; thus, LCPs of order between 600 and 1200 were gener- 
ated. We point out that- although the contact was frictionless- neither model {i.e., 
neither our complementarity-free model nor that of Anitescu and Potra) took ad- 
vantage of that fact: the full frictional models (with zero coefficients of friction) 
were used. 



4.2 Time-Stepping Example: Granular Matter 

We simulated 1001 spheres of radius 0.04m dropping into and settling within a box 
to illustrate the feasibility of our method on large scale simulations. We note that 
similar simulations have been conducted at even larger scales by 14711 : however, 
that work is less general and exhibits several features (e.g., permits interpenetration, 
non-Coulomb friction model) that limit its applicability outside of granular matter 
simulation. Figs. |6] and Q show several snapshots taken from the simulation and 
depict the rapid evolution of the system. 

As Fig. |5] indicates, we tested our time-stepping method against a time stepping 
implementation of the contact model of [4]; we used two different LCP solvers 
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simulation time 



(a) Timings of both methods (0-0.0025s) (b) Convex optimization method timing (0-ls) 

Fig. 4 Computation timings required to solve the contact model for a single cylinder of the 
internal combustion engine using both the method of |]4|] and the method introduced in this 
paper Timings for the Anitescu-Potra method are only provided to 0.0025 seconds of simu- 
lation time; the Lemke-based solver 11911 could not solve the LCP problem to sufficient toler- 
ances to continue the simulation past that point. The PATH solver [21] was able to complete 
only a single iteration of the simulation, and is thus not included in the comparison. 



Anitescu-Potra (Lemke) 

Anitescu-Potra (PATH) 

convex optinnization 




time 

Fig. 5 Kinetic energy of the granular simulation over approximately five seconds of simula- 
tion time, plotted for the Anitescu-Potra |14j| model (using two solvers) and our maximum- 
dissipation-based model. The lack of robustness in the linear complementarity problem 
solvers is evident here, as neither solver for the Anitescu-Potra model was able to simu- 
late to one second of simulation time. The identical system energy for all three approaches 
(up to the failure of the Anitescu-Potra solvers) provides some evidence that the models are 
generating identical results. 



(LEMKE iH] and PATH ^^) though both exhibited issues with robustness. As in 
the previous experiment, the pyramidal friction model was used to effect minimum 
computation time. 
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Fig. 6 Frames from time t = l.95s, t = 1.96i, and t = 1.91 s show the granules dropping into 
the box. 





Fig. 7 Frames from time t = 3.50.S, and t = 3.55s depict the granules settling within the box. 



5 Modeling Joint Friction for Articulated Bodies Formulated in 
Reduced Coordinates 



Many robotics applications simulate articulated bodies in reduced coordinates Il42ll 
for several reasons. The reduced coordinate representation is more amenable to pro- 
ducing matrices useful to roboticists (e.g., the joint space inertia matrix, the end- 
effector Jacobian matrix, etc.) and does not require tweaking parameters to min- 
imize joint constraint violations. The reduced coordinate formulation admits gen- 
erally simple formulae for modeling Coulomb and viscous friction at robot joints. 
From 114 ill , p. 141, the torques at joints due to joint friction can be modeled as: 



T^ =Mvq + AicSgn(q) 



(4) 



where /i,, and jj.c are the coefficients for viscous and Coulomb friction, respectively. 
The issue with this model is that it tends to make the differential equations stiff 
(i.e., difficult to solve numerically) using even moderately large values of jJLc and 
jU,,; this statement is particularly true for jj.c, which uses the discontinuous signum 
function. The practical effect of this issue is that either extremely small integration 
steps must be taken or the joint friction must be poorly modeled. We can, however, 
use the principle of maximum dissipation and velocity-level dynamics equations to 
model friction properly; Sec. 15.31 will show that our approach models the viscous 
component of Equation |4] exactly (for sufficiently small coefficients of friction or 
step sizes of the latter), and that the Coulomb component of Equation |4] asymptoti- 
cally approaches our model as the integration step tends to zero. 
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5.7 Maximum Dissipation-Based Joint Friction Model 

Under the principle of maximum dissipation, we wish to find the impulses that min- 
imize the new kinetic energy. The change in joint velocity is given by the formula 
Aq~ H^^x, where q is the joint-space velocity, H is the joint-space inertia ma- 
trix, and X is the vector of applied impulses. Thus we wish to minimize the quantity 
^(q + ziq) H(q + ziq) subject to Coulomb and viscous constraints on the applied 
impulses x. 



Quadratic Program 4 



Minimize 
Subject to: 



-x'H ^x + x'q 

- /ic 1 — jUvq < X < iicl + Hvq (Joint friction model) 



5.2 Solving the Joint Friction Model 



The joint friction model provided above is a quadratic program with box constraints. 
Due to the symmetry and positive semi-definiteness of H (see i20n ). the quadratic 
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Fig. 8 Plots show that joint friction is comparable between the maximal-dissipation-based 
and acceleration level methods; the former method produces this behavior at a step size three 
orders of magnitude larger than the latter. The correct behavior is for the joint position (bot- 
tom) and velocity (top) to remain at zero, given that the coefficient of Coulomb friction is 
so large (5.0). Note that the joint velocity for the acceleration level method is unstable for 
At = 0.01, relatively large for At = 0.001, relatively small for At = 0.0001, and near zero for 
4f = 0.00001. 
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program is convex, and thus a global minimum can be found in polynomial time. In 
fact, programs with hundreds of variables — well within the range of the joint spaces 
of all robots produced to date — can be solved within microseconds using a gradient 
projection method 03511 . 



5.3 Empirical Results 

Figs. [8] and |9] depict the efficacy of using the quadratic program defined above for 
effecting joint friction. We simulated an anthropomorphic, eight degree-of- freedom 
(DOF) manipulator arm acting only under the influence of gravity from an initial 
position. In Fig. [8] coefficients of joint friction of /i^- = 5.0 and jU,, = 0.0 were used, 
while the coefficients of joint friction were jJLc — 0.25 and /i,, = 0.1 in Fig.|9] Two 
methods were used to model joint friction: the acceleration level method described 
in 14 in {i.e., Equation lU and the quadratic programming method based on the prin- 
ciple of maximal dissipation described above. 

Although Figs. [8] and |9] depict only two DOFs of the robot arm (the bleep and 
shoulder, respectively), the results in the plots are indicative of all the DOFs for 
the robot: the acceleration level method converges to the maximal dissipation-based 
method as the step size for the former becomes sufficiently small. Indeed, the top 
plots in the two figures {i.e., the joint velocities) indicate the result for the accel- 
eration level method when the step size is not sufficiently small: the simulation 




Manipulator Shoulder Joint, fi^ = 0.1, fic = 0.25 



2 4 6 

Simulation Time (s) 




Acceleration Level Method 

At = 0.001 
— A« = 0.0001 

Maximal Dissipation Method 
-^At = 0.001 



Simulation Time (s) 



Fig. 9 Plots show that joint friction is comparable between the maximal-dissipation-based 
and acceleration level methods; the former method produces this behavior at a larger step 
size than the latter. Note that the joint velocity for the acceleration level method oscillates 
notably for At = 0.001 and these effects are reduced for At = 0.0001. 
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becomes unstable. Our results indicate that the quadratic programming model based 
on the principle of maximal dissipation is as accurate as an accepted model of joint 
friction but that the former admits simulation step sizes several orders of magnitude 
higher (and, correspondingly, simulation several orders of magnitude faster). 

6 Conclusions 

Roboticists are very familiar with complementarity-based contact models like that 
of 141]; such models have been incorporated into popular simulators like ODE 114311 . 
Consequently, the perception of the accuracy (and inaccuracy) of such models has 
been informed by considerable practice. The complementarity-free models that were 
introduced in this paper do not possess such a track record, and direct, empirical 
comparison between such models is the subject of future work. Nevertheless, the 
principle of maximal dissipation is accepted by the applied mechanics community, 
and we have shown evidence that — at minimum — this principle can be used to sim- 
ulate plausibly mechanisms, granular matter, and joint friction. If the accuracy of 
the complementarity-free models proves acceptable, the computational advantages 
intrinsic to our models will yield considerable speedups in robotic simulation. Fi- 
nally, we argue that, even if the models presented in this paper are found to be poorly 
predictive (compared to complementarity-based models), Chatterjee's work makes 
it clear that non-complementarity-based contact models should be investigated 
further. 
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A The LCP Determined Feasible Point Is Energetically 
Consistent 



We now prove that the solution determined in phase I in Sec. 12. 61 is energetically 
consistent. Working from [|30i 1, the equation for work done by collision impulses is 

izT(Az + 2b). 



The proof relies upon the linear complementarity solver finding the solution y, 
which yields y^(AAfy + bAf) — 0. Given the linear complementarity condition 

y^(AAfy + bA^) — 0, we are left with 



Given that z = 



we prove jy''' {A^J + 2bN) < 0. 



y^b^ < 0. 

The above equation must hold, because y^(AAfy + bA^) = and because A^r is sym- 
metric, positive semi-definite, which implies that y^A^ry > (and therefore, that 

y^bw < 0). 



Energy-Based Modeling of Tangential 
Compliance in S-Dimensional Impact 



Yan-Bin Jia 



Abstract. This paper studies modeling of tangential compliance as two rigid 
bodies collide in the space. Stronge's spring-based contact structure [131 
pp. 95-96] is extended to three dimensions. Slip or stick is indicated by 
the tangential motion of a massless particle connected to the contact point 
(viewed as an infinitesimal region) on one body via three orthogonal springs. 
We show that the effect of tangential compliance can be analyzed using nor- 
mal impulse rather than time, contrary to a previous claim by Stronge. This 
is primarily due to the ability of updating the elastic energies of the three 
springs without knowledge of their stiffnesses or length changes. The change 
rates, nevertheless, are computable. So are sliding velocity and tangential im- 
pulse. The latter is then integrated into impact equations and contact kine- 
matics, making the whole system driven by normal impulse alone. Examples 
include a ball and a pencil bouncing on a table, and a masse billiard shot. 
The theory has potential impact on impulsive robotic manipulation in which 
the ability to deal with friction and compliance is vital for skillful maneuvers. 

1 Introduction 

Impulse-based manipulation is an area in robotics where very little work [51 
I14| is known. An impulsive force has very short execution time, and thus good 
potential for improving task efficiency. Its use could considerably simplify the 
robotic mechanism needed to perform a manipulation task, while avoiding 
uncertainties accumulated over repeated complex operations. The primary 
reason for the lack of research attention is possibly because the foundation of 
modeling rigid body impact is not fully developed and the existing theories 
often seem either too simple to be realistic or too complex to be applicable, 
especially in the presence of friction and compliance, not to mention nonlinear 



Yan-Bin Jia 

Department of Computer Science, Iowa State University 

e-mail: | j iaQcs . iastate . edu| 



D. Hsu et al. (Eds.): Algorithmic Foundations of Robotics IX, STAR 68, pp. 267- f284| 
springerlink.com (c) Springer- Verlag Berlin Heidelberg 2010 



268 Y.-B. Jia 

viscoelastic effects. Discrepancies often exist between an introduced theory 
and ttie findings from an experiment intended for its validation. 

Before presenting some related work on impact mechanics, we give a brief 
review of rigid body impact. Suppose during an impact the ith contact force 
/^ is applied at the location r^ on a body. Integration of the acceleration 
equation V = dV /dt — ^^ fi/'m over the impact duration At yields the 
total impulse ^^ li = mAV . Similarly, integrating the angular acceleration 
equation ^^ n x f^ — Qd) + lo x Qcj, where Q is the body's angular inertia 
matrix, we obtain J^i fi '^ li = QAu) since u is bounded and At -^ 0. The 
following linear impact equations relate the body's velocities to individual 
impulses: 

AV = -yi, and Au^^Q-^YinxI,). (1) 

i i 

An impulse / can be decomposed into a component of magnitude /„ along 
the contact normal and a tangential component /j_. The normal impulse /„ 
increases during both compression and restitution phases of impact. The ratio 
of its amount of accumulation during restitution to that during compression 
is a constant under Poisson's hypothesis. In solving an impact problem, /„ is 
often treated as the variable [3] with whose growth the velocities, the contact 
mode, and the impact phase are updated. 

The tangential impulse J_l, meanwhile, depends on the sequence of contact 
modes that occur during the impact. If the contact is sliding, the differential 
accumulations dl \^ and d/„ are related under Coulomb's law of friction, with 
the former opposing the instantaneous slip direction. If the contact is sticking, 
d/j^ is in a direction to counter the tendency of slip. As the direction varies, 
the tangential impulse accumulates along a plane curve, and the total impulse 
along a space curve. A closed-form solution rarely exists. 

Efforts on impact analysis have struggled over the consistencies between 
laws of Coulomb's friction and energy conservation, and Poisson's impulse- 
based hypothesis of restitution. Routh's graphical method [TU] to construct 
the impulse trajectory has proven successful for analyzing 2-dimensional im- 
pacts, and has been later extended by various researchers [H [13 [T]. For 
3-dimensional impact, Darboux [3] was the first to describe impact dynamics 
in terms of normal impulse in the form of a differential equation. His result 
was later rediscovered by Keller f5^ who also used the equation's solution to 
determine the varying slip direction. 

The above efforts have neglected the effect of tangential compliance and 
assumed that all work done by the tangential reaction force is lost to friction. 
When tangential compliance is not negligible, however, part of the work is 
converted into recoverable internal energy, despite the loss of the remaining 
part to friction. Approaches [HE], designed to produce a ratio of tangen- 
tial to normal impulse equal to the coefhcient of friction, did not exactly 
follow Coulomb's law of friction. Stronge [13] developed a lumped parameter 
representation of compliance, and applied a time-dependent analysis to track 
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the change in the tangential velocity during a collision. His model could 
predict slip or stick at the contact under Coulomb's law. However, without 
knowing the duration of impact, the analysis can only be used to perceive 
contact modes qualitatively rather than to carry out specific computation. 

Computation of tangential impulse is the key to solving an impact problem 
and the focus of this paper. We extend the structure of Stronge's linear model 
of planar impact with compliance J13', pp. 95-96] to develop a theory for 3- 
dimensional impact that is based on normal impulse only and consistent with 
both laws of Coulomb friction and energy conservation. 



2 Tangential Impulse 

During a collision of two bodies, the gravitational forces are negligible com- 
pared to the contact force. The configuration can be oriented to keep the 
contact tangent plane horizontal. To model tangential compliance, we first 
extend the planar contact structure used by Stronge [131 PP- 95-96] to three 
dimensions. The "contact point" on the upper body does not directly touch 
the lower body but is rather connected to a massless particle p via three 
springs respectively aligned with the upward normal h and two orthogonal 
tangential directions u and w, all unit vectors, as shown in Fig. (T) The di- 
rection ii is chosen to oppose the tangential component of the initial contact 
velocity Wq, and w = h x ii. All velocities will be measured along these 
directions but not relative to p which may move during impact. 

The contact force F on the upper body is decomposed as F = FuU + 
F^w + FnU, with each component exerted by one of the three springs. The 
impulse / = jFdt also has components /„, /„, I^, respectively in the three 
orthogonal directions; namely, 




Fig. 1 Compliance model for 3-dimensionaI impact. The contact point, initially 
coinciding with the particle p, is blown up into a small region (shown on the right) 
connected to p via three springs. 
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I = Inn + IuU + IwW. (2) 

2.1 Impact Model 

The impact starts with compression and ends with restitution. In the com- 
pression phase, the normal spring compresses and its elastic energy En builds 
up while the normal component Vn of the contact velocity v decreases. 
Compression ends when u„ — 0; at this moment En has the maximum 
value, say, -Emax- During restitution, the normal spring extends, releasing 
an amount e^E^max of the energy, where e G [0, 1] is referred to as the ener- 
getic coejficient of restitution. The conventional kinetic coefficient of restitu- 
tion, introduced by Poisson as the ratio between the normal impulse released 
during restitution to that accumulated during compression, is not consistent 
with energy conservation when the direction of contact slip varies during 
colhsion [II p. 47]. 

We adopt Stronge's explanation [T31 p. 96] for the energy loss (1 — e^)£^niax: 
at the moment restitution starts, the normal stiffness suddenly scales up 1/e^ 
times. Namely, the normal stiffness k during the impact is given by 

, _ / fcoi compression, , , 

[ fco/e^, restitution. ^ ' 

where ko is the original stiffness. Meanwhile, the change n in the normal 
spring length suddenly varies by a factor of e^. The normal contact force F„ 
nevertheless stays the same at this phase transition. 

The two tangential springs have the same stiffness k± which is invariant 
during the impact. The ratio t^q = ko/k± is often considered a constant that 
depends on the Young's moduli and the Poisson's ratios of the materials in 
contactl^ In the analysis below, we will use the ratio 

V' = k/k^, (4) 

where rj = r]o during compression and 77 — r/o/e during restitution. 

Denote by n, m, w the changes of length of the three springs, and 
En,Eu,Eiu the elastic energies they store, respectively. The contact force 
components and the elastic energies are given below: 

Fn = -kn > 0, Fu = -k^u, F^ = -kj_w; (5) 

En = -^kn^i Eu = -k^u^, E.U, = -k^u?. (6) 



1 



For normal indentation by a rigid circular punch on an elastic half space, John- 
son [5] pp. 361-366] showed that t^q ~ 2a~ti ' where v is the Poisson's ratio of the 
half space. For most materials, this ratio ranges between and 0.5 (Wikipedia). 
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Impulse Derivatives 

Our objective is to describe the entire system in terms of the normal impulse In- 
To avoid any ambiguity, from now on the notation '"' will refer to differentiation 
with respect to time, while the notation '" will refer to differentiation with 
respect to /„ (which monotonically increases from zero during impact). 

Combining the first equations in ((5|) and ^ , the change rate of the normal 
impulse /„ over time can be described in terms of the elastic energy En'. 



in = din/dt = Fn = V^kE,,. 



(7) 



The derivative is well-defined at the impact phase transition where Fn stays 
continuous. Meanwhile, from h = Vn, the normal contact velocity, and /„ = 
Fn = —kn we obtain that £'„ == knh — — Wn-^n, thereby the derivative 



-E„ — dEn/dIn — En I In — 



(8) 



Similarly, from the other two pairs of equations in Q and ^ we obtain the 
change rates of the two tangential impulses: 



/„ 



F.,, 



-a 



\/2k±Eu and I^ :^ F^ ^ -l3\/2k±Eu 



(9) 



where a and /3 are the signs of the length changes of the u- and w-springs, 
i.e., 

lifw>0, f lifu;>0, 

-1 if w< 0; ^~ 1 -1 if u; < 0. 



(10) 



Equation Q is important because it allows us to convert a derivative with 
respect to time into one with respect to the normal impulse /„ simply by a 
division over \/2kE„. As an illustration, we have 



/: = 



kj_Eu 




and /' = — 



P 




(11) 



/„ V kEn ?7 V ^n V 

In fact, the stiffnesses k and k± will always occur together in the ratio form. 



2.2 Contact Modes 

The contact velocity v of the two bodies is ob- 
tained from their velocities and angular velocities 
as well as the locations of contact on each body, 
based on the contact kinematics. This will be il- 
lustrated in the examples in Section [31 For now 
we just assume that v is provided, and denote its 
tangential component as v±. Then the velocity of pj„_ 2 Sliding velocity of 
the particle p is the contact particle p. 
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Vs = v± — uii — WW. (12) 

When Vg = 0, i.e., v± — iiu + ww, the contact sticks. In other words, the 
relative motion of the upper body to the lower body in the contact plane is 
completely absorbed by the two tangential springs so that p has no motion. 
When Vs y^ 0, it is the sliding velocity of the contact. 

When slip happens, under Coulomb's law, the tangential contact force 
F±_ = —fxFnVs, where n is the friction coefficienlo and 

Vs v^-iiu-ww 
Vs — J. — j- = j^ — j^ , from (|12|). (13) 



Since the force also exerts on the u- and w-springs, we obtain 

k±{uU + Ww) ~ flFnVs, (14) 

Substitute p^ into ^^, and rearrange the terms slightly: 

k\ 
v±—uu — ww~ \\vs\\{uu + ww). (15) 

fJ-Fn 

Equation (fTSl) also holds under the sticking contact since it reduces to v± = 
iiu + WW when Vs — 0. Take dot products of both sides of p^ with m, and 
then multiply by w. Similarly, take dot products with w and multiply by u. 
Subtracting the two resulting equations, we have, under both contact modes, 

wii — WW = [v i_ ■ u)w — {vj_ ■ w)u. (16) 



The contact between the two bodies sticks when \/F^-\-F^ < M-Fn, namely, 
by dni), when 

\/il + il < A*4- (17) 

To replace the time derivatives, we substitute equations Q and (|9]) into the 
above, and rearrange the resulting terms after squaring both sides: 

Eu+E^<pi^rfEn. (18) 

When the contact slips, we have the equality 

Eu+E^^^i^rfEn. (19) 

2.3 Stick 

Since Vg — 0, the lengths of the u- and w-springs change at rates 



^ The difference between static and dynamic coefficients is ignored so the value of 
fj, stays constant whether the contact sticks or slips. 
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u = v± ■ ii = V ■ ii and w = v ■ w. (20) 

Their length changes are 

• J /" " 7T /" " IT 1 /" ^-L ,r - 

Udt = / -:- din = I , din = , / , din ' M, 

J In J V2kK V2kJ VK 

W — , / , din ■ W. 

Suppose compression ends at /„ — Ic and restitution ends at /„ = Ir- Ob- 
serving ([3l), we introduce a vector integral 

(j^-^dln, if/„e[0,/c); 

£> = < , r (21) 

[ /o ^ ^ din + Il"e^ din, if In G [/c, /.] , 

so that during impact the following always hold: 

D ii , D w .. 

and w = (22) 



V2fco V2fco 

Instead of computing u and w, we keep track of \/2kou and \/2kQW by up- 
dating £>, without any knowledge about feg- 

The update of Z) is possible because v± is from the contact kinematics, 
and En is by ^. The values of a and /? in (|10p are immediately known from 
the signs of Z) • m and Dw. The integral is also used to conveniently evaluate 
the tangential elastic energies, for 

Eu = Wu^ = I'^iD-iif = -^{D-iif and E^ = ^{D-wf. (23) 
2 4 fco 4775 477g 

;2.4 Slip 

When the contact slips, equation (fTn|) holds. We substitute the spring energies 
Eu and E^u from ([6]) in and obtain, by the use of ^, 

u^+w^ ^2fi^--^En. (24) 

kj_ 



Then differentiate (|24l) with respect to time: 



k ■ 

UU + WW = fl^-ppEn- (25) 

kj_ 



Now, we can solve the spring velocities ii and w from ([T5|) and (P5|) : 
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aii^iq-'E'^^Er.E^ + (v_L • u)E^ - 


- aP{v± ■ 


w)^JEuE.y, 


IJ^WEn 


- al3{v±_ 




^fi'rfE'^y/E^E^ + {vi_ ■ w)Eu 


■ u)y/EuE^ 



(26) 



" = l^WK • ^''^ 

With u and w known, the shding velocity Vg follows from (fT^ . 

The change rates (|2S| and (HTj) do not tell whether the springs are being 
compressed (e.g., m < 0) or elongated (e.g., u > 0). Since 



we introduce two integrals Gu and Gw, where for p = u,v, 

Gp= < r "^ r . 29 



Comparing these two equations with psp . G^ = \/2kou and G^, = \/2kow. 

The two integrals G„ and G^, are used to not only track the signs of u and 
w but also update tangential elastic energies as follows: 



1 y^2 ^2 

' ' -§ and E^ = p^. 



Eu^^k^u^ = P^ and E^ = ^. (30) 

2 4?7^ 477(j 



;2.5 Contact Mode Transitions 

At a contact mode switch, we need to initialize the integrals D or Gu and 
Gw in order to track whether the tangential springs are being compressed or 
extended and update E^ and E^, during the next contact mode. 

Stick to Slip 

The contact point switches its mode when F^ + F^ = A*^^nj i-^-i when (fTOj) 
holds. We initialize the integrals for slip using ([5n|l : 

Gu = 2arioy/E^ and G^ = 2[iirioy/E^, (31) 

where i^u, -E^, a and /3 inherit their values from just before the change of 
contact. 

Slip to Stick 

The contact switches from slip to stick when the sliding velocity Vs vanishes, 
that is, when 

v± = iiii + WW (32) 
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from (dH . Equations ^^ imply that 



D = y/2k){uu + wu) = y/2h) i aJ—^u + pJ—^w j by <^ and ^ 
= 2?7o (a^/Ku + P^/E^w) ■ (33) 

2.6 Impact Algorithm 

The system of impact equations does not have a closed-form solution in gen- 
eral. Simulation is carried out via numerical integration over /„ with some 
step size, say, h. The pseudo-code is given below. 

initialization 
din <— h 

while (compression or En > 0) do 
if D • n = 

then compression ends 
if contact sticks 

then update D according to (|21|l and Eu,Ew according to (|23p 
if slip starts by (|19l) 

initialize Gu and Gw as (|31|l 
else evaluate ii, v according to (I26p . (|27p 

update Gu, Gw according to (I29p and Eu,Ew according to (|30|l 
if stick starts by (|32)l 
initialize D as (|33|l 
evaluate /„ and /^ as (|lip 
I ^ I + dIn- (iLu + iLw + fi) 
update V and a; using impact equations (ITJ 
update V using V and a; according to contact kinematics 

En ^ En - Vndin by ((Hjl 



2.7 Impact Initialization 

To start the algorithm, we need to initialize the contact mode, the normal 
and tangential elastic energies, the integral D or Gu, Gw, accordingly, and 
the tangential impact. Let the initial contact velocity be Vq = vonn + Va±. 
By ([HI we have that E'^^{0) — — uon and En{h) w —VQnh. 

Stick or Slip? 

We first assume that the impact starts with stick, and apply our analysis 
from Section \T^ to derive a condition on Vg. Here we look at a small period 
of time At after the impact begins. The force on the u-spring is 



.At 




pAt 


1 udt = 


-fc,i 


/ {v 







h 



In = -k^u = -k±_ / udt = -k± / {v -u) dt, by (gOI- (34) 
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Similarly, we obtain the forces exerted by the other two springs: 

pAt pAt 

Iw = —kj_ / {v ■ w) dt and /„ = — fco / {v ■ fi) dt. (35) 

Substitute the above three time derivatives into the sticking contact condi- 
tion p7)) and move the integral over {v ■ n) to the left side of the resulting 
inequality: 



i-At, -.^ ~ , , - \ -- ,,ii rAt 



11 L (v-u)u + (v-w)wdt\\ L \/(v ■ m)2 + (v ■ w)'^ dt 

lim -^^ — = lim — — ^— 

■^*-o -J^\vn)dt ^t^o -jf\vn)dt 



\/(«(K~^F+T^^O~^«^ fco 2 

The first equation above follows from that v ■ u and v-w do not changes signs 
within At. Hence we infer that the impact starts with a sticking contact if 

{va ■ iif + (vq ■ wf < ti^vti^o ■ fif, (36) 

or a sliding contact if 

(-^o • m)' + {vo ■ wf > fi^voivo ■ fif- (37) 

Initial Stick 

Using a similar approach, we can obtain the initial values of the derivatives 
of the tangential impulses when the contact sticks at the beginning: 

4(0) = ^ ■ r and 7(0) = ^ -. (38) 

Next, we substitute _E,^(/„) k, — won into the integral (|2ip over [0, h]: 



1 ,, 1 



D{h) = / , ^ ^, „^ din ■ Vol_ ~ 2y/-V0nIn 

Jo y/ -VQnIn + 0(ll) VOn 



•■woi 





, h 

= -2^/ t)o±, smce Won < 0. 

VQn 

The initial elastic energies Eu{h) and E^ih) of the tangential springs are 
then evaluated according to (|^5|) . 

Initial Slip 

When the contact initially slips, the impulse derivatives follow Coulomb's 
law; namely. 



Energy-Based Modeling of Tangential Compliance in 3D Impact 



277 



-^«(0) = -M-n r-^-fiVa-u and I'^{0) = -^Vq ■ w. (39) 

The sliding velocity Vg must have the same direction as the direction t of the 
relative tangential velocity Vo±. Substituting ^ into (fTijl for F„, we write 
down the changes of length of the u- and w-springs, and obtain E^ and Ey^, 
from ([HI, and G„ and G^ from (150]) . 



3 Examples of Bouncing 

This section demonstrates incorporation of tangential impulse into impact 
equations ((T]) using two examples. We look at bounces of a ball and a pencil, 
which result in planar and space impulse curves, respectively. 



3.1 Ball 




As shown in Fig. [3l a ball at initial velocity Vq 
and angular velocity Wq collides with a still ta- 
ble. Let r be the ball's radius and m its mass. 



Hence its angular inertia 



Denote by z 



Fig. 3 Bouncing ball. 



the upward contact normal, and I the impulse 
exerted by the table on the ball during the col- 
lision. The velocity equations ((T|) specialize to 



Vo 



I 

m 



and 



(jja 



2rnr 



-z X I. 



We obtain the contact velocity and its tangential component: 

Iz 7 

V = V + (jj X {-rz) = Vo + —z + -— /_L, 

m 2m 



V± = Vq± 



7 

J 

2m 



where t^o and Voj_ are their initial values, and I — IzZ + I±. We also obtain 

Theorem 1. During the collision of a ball with a still table, the tangential 
impulse Ij_ is collinear with the initial tangential contact velocity V()j_ . 

The proof is by induction over the number of contact mode transitions. De- 
tails are omitted due to lack of space. 
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Impulse Curve 

Theorem [T] states that the impulse curve / lies in the ver- 
tical plane spanned by the z-axis and foJ-D So we can 
conveniently place the origin at the contact point, and the 
a:-axis in the opposite direction of Vo±. The x-y-z frame 
is identified with the n-u-w contact frame for tangential 
impulse in Section [21 

For simplicity, consider m — \ and r = 1. Let the coef- 
ficient of friction be /x = 0.4, the coefficient of restitution 
e — 0.5, and Poisson's ratio of the ball v = 0.3. Here, 
we use r^g = (2 — v)/{2 — 2v) for a circular punch on a 
half space [HI pp. 361-366]. Consider Vq ~ (—1,0,-5) 
and Wo = (0,2,0), which yields Yq^ = (-3,0,0). After 
the collision, the ball will bounce backward with a rever- 
sal of its rotation: V = (K, 0, 14) = (0.570982, 0, 2.5) and 
ui = (0, Wy,0) = (0, -1.92746,0). Its total energy wiU de- 
crease from 13.4 to 3.65997. 

Fig. [31 plots the impulse curve, on which the blue and 
black dots mark the ends of compression and restitution, 
respectively, and the two green dots mark the contact 
mode transitions. The impact starts with a slip, changes 
from slip to stick at Iz = 0.62485, ends compression at 1^ — 
starts a reverse slip at Iz = 7.36575, and ends restitution at I; 
mvoz = 7.5. 




Fig. 

curve. 



4 Impulse 



-mvoz 



-(1 



5, 
e) 





Fig. 5 (a) Tangential contact velocity; (b) a;-spring velocity; and (c) varying spring 
length X scaled by \/2ko. The dashed line in (b) marks a discontinuity as reverse 
slip happens. 



It degenerates into a vertical line segment when vo± — 0. 



Energy-Based Modeling of Tangential Compliance in 3D Impact 



279 



-0.4 



0.5 



During the impact, the tangential contact 
velocity and spring velocity, aligned with 
the X-axis, are treated as scalars here and 
denoted v± and i, respectively. As shown 
in Fig. [Sfa), v± starts at —3 and ends 
at 2.49847. The x-spring velocity increases 
from —2.42852 with I^ until it equals v± at 
—2.12528, when the contact switches from 
slip to stick. Fig.[5ljb) shows a sudden change 
of i from 2.59255 to -2.29806 when a slip re- 
versal occurs at Iz = 7.36575. To see why, 
note that under slip i must satisfy (P5|, 
which becomes xx — {ii^k/k'\)Ez. Because 
the transition happens during restitution, 
Ez < 0, so X and x must have opposite 
signs at the moment. However, as shown in 
Fig. ini^b) and (c), both x and x were posi- 
tive before the slip reversal. Hence the sud- 
den change in x. We can show that during stick the massless particle is in a 
simple harmonic motion. 




-0.4 



-0.8 



-1.2 



-1.6 




(b) 



Fig. 6 Effects of (a) friction and 
(b) Poisson's ratio. 



Effect of Pr let ion 



and ujy reach their extrema 



Fig. |Bl[a) plots the post-impact velocities Vx and tOy as the coefficient of 
friction /i varies from to 1.0, where Vq = (^1,0, —5) and Wo = (0,2,0). 
When friction is low (/x < 0.13), the ball will bounce to the left but keep 
the original clockwise rotation about the y-axis. As fi increases from 0.13 but 
does not exceed 0.16, the ball will reverse its rotation but still bounce to the 
left. As friction becomes higher (/.t > 0.16), the ball will bounce backward 
with a rotation reversal. At /i = 0.36, both v. 
0.57591 and -1.93976, respectively. 

Effect of Compliance 

The dependence of Vx and tOy on Pois- 
son's ratio V over its normal range [0, 0.5] 
is shown in Fig. [HJb). Again, V^ — 
(-1,0,-5) and Wo = (0,2,0). As v in- 
creases from to 0.5, Vx after the impact 
increases monotonically from 0.45729 to 
0.66507, while Uy decreases monotoni- 
cally from -1.64322 to -2.16266. The 
more compliance, the less energy loss. Fig. 7 Pencil with velocities Vo and 

Wo hitting a table: (a) dimensions 

and (b) frames. 
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3.2 Pencil 



We move on to consider another task which many of us may have tried on a 
desk — throwing a pencil and watching how it bounces. Most of the time the 
pencil is thrown with its rubber eraser downward, but here let us consider a 
pencil strike with the pointed end contacting the desk. 

As shown in Fig. [Tl^a), the pencil is modeled as a cylinder with mass mi 
and height hi on top of a cone with mass m2 and height /12, where both 
components have the same mass density. The cross sections of the cylinder 
and the top face of the cone have the same radius r. The pencil's center of 
mass Op is located on its axis of symmetry at distance h above the cone's 
vertex 02, where h — {6hl + 12hih2 + 3/i|)/(12/ii + 4/12)- A body frame 
Xp-j/p-Zp is placed at Op with the Zp-axis aligned with the pencil's axis of 
symmetry. 

The moment of inertia Q of the pencil about its center of mass Op is a 
diagonal matrix with first two principal moments: 



Qii — Q2 



/11+/12/3 



(K 



3r^ + hi 
12 



+ f] + 



3 U 4 



hi) 



where m = mi + 1712 and I = hi/2 + h2 — h. 

We treat the simple case where the pencil lies in a vertical 
plane at the moment of the strike. Let the plane be both the 
x-z plane of the desk frame at the contact point and the Xp-Zp 
plane of the pencil's body frame. See Fig. \T[h). The pencil, 
tilted at an angle 9 just before the hit, has velocity Vq rela- 
tive to the desk frame and angular velocity Wg = (wi, W2, ^3) 
relative to a (fixed) frame instantaneously coinciding with the 
pencil frame. The orientation of the pencil frame in the desk 
frame is described by a rotation matrix R about the y-axis 
through 9. The velocities are determined from the impulse 




Fig. 8 



V = Vo + — and -hzpx (R-^I) = Q(uj-uJo). 
m 

The velocity of the contact point during the strike is linear in /: 

Ix sin^ 9 — Iz sin 9 cos 9 



v = Vo 




ly I . (40) 

-/t sin 9 cos 9 + 1, cos^ i 



Specifically, we simulate a pencil with m = 1, r = 1, /ii = 3, and /i2 = 0.5. 
Let ji = 0.8 and e = 0.5. The pencil tilts aX 9 — 7r/3, and strikes the 
desk with velocities Vq = 5(— cos |, 0, — sin|) and u)q — (1,0.5,0.5). The 
post-impact velocities are V == (-1.80954,-0.546988,1.2076) and Wo = 
(0.09957,-0.04174,0.5). The pencil bounces upward with reduced motion 
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along the negative a:-direction. It has gained a new motion along the neg- 
ative y-axis. The angular velocity has changed along both Xp- and j/p-axes 
in the pencil's body frame. Its component along the Zp axis, i.e., the axis of 
symmetry, remains as 0.5, due to zero torque about the axis during impact. 

Fig. [5] plots the impulse curve, which 
grows from the origin to (2.52059, 
-0.54699, 3.7076). The contact point shdes 
during impact. The impulse projection 
onto Ix-Iy plane (see Fig. [9| is also a 
curve. This shows that the sliding direc- 
tion was constantly changing in the impact 
Fig. 9 Tangential impulse. duration. 





4 Simultaneous Collisions with Compliance 

In [7] , we introduced a method to model simultaneous collisions of multiple 
objects based on transitions between states that characterize different com- 
binations of the objects instantaneously in contact. Tangential impulses due 
to friction were then treated naively without considering compliance. As a 
result, the effect of a skillful shot like a masse one could not be modeled based 
on a measured input. 

Tangential impulse due to compliance easily applies to 
simultaneous impacts. We here illustrate over a masse shot 
(see Fig. [TUI) . The cue stick hits the ball at a point with 
outward normal n, and the ball in turn hits the table with 
upward normal z. We set up a local frame at the cue-ball 
contact with axes h, u, and w, and another frame at the 
ball-table contact with axes z, x, and ij. 

The impulses exerted by the ball on the cue stick and by 
the table on the ball are respectively Ji — luU+IwW+Inn 
and 1 2 = IxX + lyij + IzZ. According to [7j, we can treat 
exactly one of /„ and Iz as the variable within a state, while the other as 
a dependent. Now, use the method in Section [5] to obtain 7„ and I^ from 
/„, and Ix and ly from I^. Next, use Ji and 1 2 to update all velocities, thus 
closing the loop. 

We have designed a mechanical cue stick [7] which allows us to calculate the 
velocity of the cue tip before a shot. After the shot, the x- and y- components 
of the ball's velocity and angular velocity can be recovered from its trajectory 
via some involved steps. 

Fig. fTTl a) shows the trajectory fit over positions (red dots) sequenced from 
the video of a shot executed at a point near the top pole of the cue ball. The 
values of eleven relevant physical constants are omitted due to lack of space. 
The estimated velocity and angular velocity of the cue ball immediately after 
the shot are (-1.65389,-0.36149,^ and (24.2768, 80.537,_), respectively. 



Fig. 

shot. 



10 Billiard 
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Fig. 11 Billiard trajectories (a) recovered from the video of a real shot and (b) 
predicted by the impact model with cue velocity estimated from the same video. 



Fig. iriT b) shows the predicted ball trajectory by the impact model un- 
der the same shooting condition. The ball gets velocity (—1.65774,0.24380, 
0.73265) and angular velocity (15.9378, 52.9882, -3, 67556). Despite the small 
differences in the two pairs of velocities, the resulting trajectories differ 
widely. This is in part due to inaccurate measurements of related physi- 
cal constants (including a guess over the relative stiffness of the cue-ball 
and ball-table contacts), the point-based impact model, and uncertainties of 
the shot. 



5 Discussion 



The key of the introduced compliance model for impact lies in that the elastic 
energies stored in the three orthogonal springs can be updated as functions of 
normal impulse without knowledge about their stiffnesses or length changes. 
The change rates of the spring lengths are nevertheless computable, so is 
the sliding velocity. Contact modes are decided from elastic energies rather 
than forces. All these make computation of tangential impulse possible, with 
normal impulse the sole variable of the impact problem. 

In [I3l , Stronge claimed that the frictional energy loss depends on the slid- 
ing speed (i.e., the particle velocity v^), correcting his earlier statement that 
it depends on the tangential relative velocity v. But it was not until the re- 
cent work by Hien [F was the formulation of frictional dissipation completed. 
In our work, such dissipation is accounted for as the energies E^ and E^ are 
stored and released by the two tangential springs. 

In their study |16| of a dimer bouncing on a vibrated plate, a similar 
differential impulse relationship is set up based on the ratio of the potential 
energies stored at the contact points. Coulomb's friction law is applied over 
the corresponding impulse increments. In our work, the friction law is applied 
in the forms ([T5|) and (fTO)) over the elastic energies stored at the contact. 
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The highly nonhnear nature of impulse accumulation (as shown in the 
pencil and the billiard examples) due to contact compliance would present 
an obstacle for formulation of a linear complementarity problem (LCP) as 
in |12j with a time-stepping solution. Our method is more accurate since it 
does not approximate the contact friction cone as a polyhedral cone, and also 
more efRcient without having to repetitively solve linear systems. 

The next step is to further integrate the compliance model with our pre- 
viously developed model for simultaneous impacts U- Modeling of billiard 
shots provides a challenging test bed for meshing the two theories. A longer 
term objective is to apply the theory to impulsive robotic manipulation. 

Acknowledgements. This work was sponsored in part by Iowa State University. 
Part of the work was performed in China. The author would like to thank Matt 
Mason for a pointer to Stronge's work on modeling compliance in impact mechan- 
ics, and Mike Erdmann for his encouragement. He is grateful to the anonymous 
reviewers for their valuable and constructive feedback. 



References 

1. Ahmed, S., Lankarani, H.M., Pereira, M.: Frictional impact analysis in open- 
loop multibody mechanical systems. J. Applied Mech. 121, 119-126 (1999) 

2. Brach, R.M.: Tangential restitution in collisions. In: Schwer, L.E., et al. (eds.) 
Computational Techniques for Contact Impact, Penetration and Perforation of 
Solids, ASME AMD, vol. 103, pp. 1-7 (1989) 

3. Darboux, C: Etude geometrique sur les percussions et le choc des corps. Bul- 
letin des Sciences Mathematiques et Astronomiques, deuxieme serie, tome 4, 
126-160 (1880) 

4. Han, I., Gilmore, B.J.: Impact analysis for multiple-body systems with friction 
and sliding contact. In: Sathyadev, D.P. (ed.) Flexible Assembly Systems, pp. 
99-108. American Society Mech. Engineers Design Engr. Div. (1989) 

5. Hien, T.: A correction on the calculation of frictional dissipation in planar 
impact of rough compliant bodies by W. J. Stronge. Int. J. Impact Engr. 37, 
995-998 (2010) 

6. Huang, W.H., Mason, M.T.: Mechanics, planning, and control for tapping. 
Int. J. Robotics 19(10), 883-894 (2000) 

7. Jia, Y.-B., Mason, M., Erdmann, M.: A state transition diagram for simul- 
taneous collisions with application in billiard shooting. In: Chirikjian, C, et 
al. (eds.) Algorithmic Foundations of Robotics VIII, pp. 135-150. Springer, 
Heidelberg (2010) 

8. Johnson, K.L.: Contact Mechanics. Cambridge University Press, Cambridge 
(1985) 

9. Keller, J.B.: Impact with friction. J. Applied Mech. 53(1), 1-4 (1986) 

10. Routh, E.J.: Dynamics of a System of Rigid Bodies. MacMillan and Co., Bas- 
ingstoke (1913) 

11. Smith, C.E.: Predicting rebounds using rigid-body dynamics. ASME 
J. Appl. Mech. 58, 754-758 (1991) 



284 Y.-B. Jia 

12. Stewart, D.E., Trinkle, J.C.: An implicit time-stepping scheme for rigid body 
dynamics with inelastic collisions and Coulomb friction. Int. J. Numer. Methods 
Engr. 39, 2673-2691 (1996) 

13. Stronge, W.J.: Impact Mechanics. Cambridge University Press, Cambridge 
(2000) 

14. Tagawa, K., Hirota, K., Hirose, M.: Manipulation of dynamically deformable 
object using impulse-based approach. In: Zadeh, M.H. (ed.) Advances in Hap- 
tics, pp. 16-33 (2010) 

15. Wang, Y., Mason, M.T.: Two-dimensional rigid-body collisions with friction. 
J. Applied Mech. 59, 635-642 (1991) 

16. Zhao, Z., Liu, C, Brogliato, B.: Planar dynamics of a rigid body system with 
frictional impacts. Proc. Royal Soc. A: Math. Phys. Engr. Sci. 465, 2267-2292 
(2009) 



Sampling-Diagram Automata: A Tool for 
Analyzing Path Quality in Tree Planners 

Oren Nechushtan^*, Barak Raveh^'^*, and Dan Halperin^ 



Abstract. Sampling-based motion planners are a central tool for solving motion- 
planning problems in a variety of domains, but the theoretical understanding of their 
behavior remains limited, in particular with respect to the quality of the paths they 
generate (in terms of path length, clearance, etc.)- In this paper we prove, for a 
simple family of obstacle settings, that the popular dual-tree planner Bi-RRT may 
produce low-quality paths that are arbitrarily worse than optimal with modest but 
significant probability, and overlook higher-quality paths even when such paths are 
easy to produce. At the core of our analysis are probabilistic automata designed 
to reach an accepting state when a path of significantly low quality has been gen- 
erated. Complementary experiments suggest that our theoretical bounds are con- 
servative and could be further improved. To the best of our knowledge, this is the 
first work to study the attainability of high-quality paths that occupy a significant 
(non-negligible) portion of the space of all paths. The formalism presented in this 
work can be generalized to other algorithms and other motion-planning problems 
by defining appropriate predicates, and pave the way to deeper understanding of 
hallmark planning algorithms. 



1 Introduction 

The problem of finding collision-free paths for moving objects among obstacles, 
known as the Mover's problem, is P-Space hard when the number of degrees of 
freedom is considered a part of the input 1 24] . The problem of finding optimal paths 
with respect to some quality measure (such as path length, path clearance, energy. 
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etc.) is even harder, and was shown to be NP-hard even for simple special cases of 
the Mover's problem |[lil3ill8l|23|]. Although these hardness results are discouraging 
in terms of worst-case analysis, in practice, motion planning is effectively used for a 
wide range of applications in diverse domains ll4l ll6l] . In particular, the wide use of 
sampling-based algorithms, such as PRM |12l], RRT flTI], EST (gfl and their many 
variants, has extended the applicability of motion planners beyond the restricted 
subset of problems that can be solved efficiently by exact algorithms. 

Unfortunately, the theoretical understanding of sampling-based algorithms falls 
far behind the practical one, and in many senses, their proper usage may still be 
considered somewhat of an art. The most important results that exist to date shed 
light on the asymptotic performance of the PRM [11] and RRT 1,14,1 algorithms, 
which were both shown to be probabilistically complete i9l,lllL ll4ll . However, the 
required running time for finding a valid solution, if one exists, cannot be computed 
for new queries at run-time in practice. 

In many applications, we require not only a valid solution, but also a high-quality 
path. Several heuristics were devised for improvi ng the q uality of output paths gen- 
erated by existing motion planners (e.g., 17., SL IisLboI 12211 ). While these approaches 



are theoretically motivated to some extent, the actual performance of sampling- 
based algorithm with regard to path quality is still poorly understood, even in very 
simple settings. Recently, Karaman and Frazolli|10] analyzed the convergence of 
sampling-based planners to optimal paths of zero-measure in the configuration, and 
have shown that at a reasonable cost of additional running time, modified variants 
of the RRT algorithm reach an optimal solution with probability one, as the num- 
ber of samples goes to infinity. They also showed that the original RRT algorithm 
converges to sub-optimal solutions with probability one (under certain assumptions 
on the quality function used to measure path quality). However, this result does not 
indicate how bad the sub-optimal solution is - hypothetically, its quality may be 
(1 — e) that of an optimal path, for some very small £. 

In this study, we rigorously analyze a simple family of toy-examples, which we 
call The Promenade problem (Figure [T]). In this setting, the shortest possible path 
is highly accessible, without narrow paths leading towards it. We prove that the 
widely used dual-tree variant of the Rapidly-exploring Random Trees (RRT) al- 



gorithm, Bi-RRT ll4j], may take inferior paths that are arbitrarily worse than the 
optimal path and miss any higher-quality path with small but significant probability. 
Importantly, we provide a uniform probabilistic bound that holds for any instance 
of the Promenade problem (Figure[T]). Our work, which is to the best of our knowl- 
edge the first of this kind, studies the attainability (or rather the non-attainability) of 
high-quality paths that occupy a non-negligible portion of the space of all paths, in 
contrast to the recent work by Karaman and Frazzoli jlOll . 

As a more general contribution, we present a novel automaton-based approach 
for analyzing the performance of the Bi-RRT algorithm in a probabilistic manner, 
by assigning probability bounds to transitions between automaton states, which cap- 
ture the progress of the algorithm. We believe that our new approach could greatly 
improve the currently limited theoretical understanding of sampling-based motion 
planners, by introducing a formalism for describing their behavior. 
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Outline: In Section |2] we briefly reiterate the Bi-RRT algorithm and define the 
Promenade family of obstacle settings. In Section [3] we introduce the Automaton 
of Sampling-Diagrams (ASD) formalism for describing the progress of the Bi-RRT 
algorithm, and in Section |4] we instantiate an appropriate ASD that proves a prob- 
abilistic lower-bound on the performance of Bi-RRT, stating and proving the main 
theorems of this manuscript. In Section[5]we empirically demonstrate that our bound 
holds for a standard implementation of Bi-RRT, and seems a fairly conservative one, 
and in Section[6] we discuss possible extensions of our method. 



2 Problem Setting 

2. 1 The Bi-RRT Algorithm 

Let 12 be a motion-planning problem (defined by a mobile object and a workspace 
cluttered with obstacles), and let {0,qi ,q2) be the motion-planning query of moving 
the object between configurations 15^1 and 15^2 among the obstacles. In this work, we 
analyze the performance of the Bidirectional-RRT algorithm variant as described 
in the book by LaVallePla, p. 236], with slightly modified notation (Algorithm [l]). 
The start and goal configurations, qi and 172 respectively, serve as the roots of the 
trees Ti and T2. In each iteration ;, the algorithm extends one of the trees Teur within 
the obstacle-free portion of the configuration space, C free, towards a random sam- 
ple a,, drawn from the configuration space C (lines 3-7). The other tree Tother is 
then extended towards T^ui (lines 8-12). The algorithms breaks if the two trees were 
successfully connected and the query solved (line 13). At the end of each iteration, 
the tree to be extended may be swapped with Tother (line 14), most commonly in 
order to maintain a balance between the size of the two trees. However, the analy- 
sis we make here would hold for any possible tree-swapping strategy (see below). 
The pseudo-code for Bi-RRT is given in Algorithm [T] 5, denotes the swath of 7; 
(see below). 

2.2 Subtle Implementation Issues of Bi-RRT 

1. Tree swaths: The swath Si of the tree 7] is an image of the vertices and edges of 
the tree in the configuration space Plw, p. 236]. In a full setting, the swath is the 
union of line segments that correspond to T, edges. In an approximate setting, it 
is reduced to a point subset (e.g., configurations of tree vertices). 

2. Distance metric: The nearest function is defined as the nearest point according 
to some distance metric, with respect to the full swath, or to an appropriate point 
subset|16, pp. 230-234]. 

3. Impact of tree-swapping strategy: The choice of tree to be extended in each iter- 
ation (swapping strategy) may alter the algorithm output solution and complicate 
its analysis. The strategy of balancing the sizes of the two swath suggested in AI- 
gorithm[T|(line 14) is not always trivial to implement - in particular for the case 
of a full swath, adding even a single sample a to a tree may considerably increase 
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Algorithm 1. Bi-RRT(f2, qi , q2, M) 

1. ri.init(gi); ra.initfe); Tiur = ^i; Tother = Tr, 

2. for i = 1 to M do 



3. 


q„ ^ nearest(5cui-, (T,); 


4. 


qs ^- stopping-configuration((7„,(7,); 


5. 


itq.j^qn then 


6. 


rcur-add.vertexCgj); 


7. 


rcur.add.edge(g„, q,); 


8. 


q'„ ^ nearest(5other, 9i); 


9. 


q', ^- stopping-configuration((7j,,(7s); 


10. 


If?; 7^^; then 


11. 


rother-add_vertex(?!;); 


12. 


rother-add_edge(?; , q', ); 


13. 


itq', = q, then return SOLUTION; 


14. 


if \Totha\ < \Tcm\ then SWap(rcur,rother) 


15. 


return FAILURE; 



its swath, by introducing a long line segment towards a. The analysis we pro- 
vide in this manuscript considers any, possibly adversarial worst-case scenario, 
of swapping strategies. 

Stopping-configurations: Given a new sample a, the tree T^ui is extended from 
q,, ^C , the nearest neighbor of a in the tree swath ^cur- However, the new sample 
may not be visible from q^ due to obstacles (formally, configurations are visible 
to one another iff the line segment connecting them is fully contained in C free)- 
The stopping configuration function (line 4) is used to find a node q^ £ C free that 
is visible from q„. The Bi-RRT algorithm adds the stopping configuration, rather 
than the new sample, to the tree (lines 4-6 in Algorithm!!]), but its exact definition 
is left to interpretation. For simplicity of our analysis, we make the plausible 
assumption that the stopping configuration is the farthest visible configuration 
from qn along the line segment from q„ to a. 

Observation 1. Given a new sample G, the Bi-RRT algorithm adds o to the tree 
Tew W^ '■^ visible from its nearest neighbor in Sew 



2.3 The Promenade Motion-Planning Problem 

For some a > 0, let ^a denote a motion-planning problem in M? where we translate 
a point robot in the square configuration space C := [0, a + 2] (for a point robot, 
the configuration space and the workspace are equivalent). The free configuration 
space is C free := <f" \ [1 , « + 1] , that is, there is a single ax a square obstacle in the 
middle of (T , leaving a rectangular ring "promenade" for the robot (see Figure [l]). 
We are interested in bounding the probability that running the Bi-RRT algorithm 
for solving the Promenade motion-planning problem will result in a low-quality 
solution (type-B solution in Figure [U. 
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« = 2 B2 




a = 4 







Fig. 1 The Promenade family of motion planning problems, illustrated for inner-square obsta- 
cles (black) with three different edge-size values (scenes rescaled for illustration purposes). 
In these examples, q\ and q^ lie on opposite sides of the promenade. The ratio \l between the 
path length of type-A (solid line, crossing the regions marked Ai and A2) and type-S (dashed 
line, crossing B\ and S2) solutions, is approximately ^ in this case. Note that the probabilis- 
tic bound obtained by the Automaton of Sampling-Diagrams method for these scenes is not 
sensitive to the size of the inner-square obstacle relative to the bounding box (for all a > 2), 
or to the precise location of qi and q2. In addition, the same bound holds for any, possibly 
adversarial worst-case scenario of tree-swapping strategy (line 14 in Algorithm[TJ. 



3 Introducing the Automaton of Sampling-Diagrams 

We first introduce the formal concepts underlying the construction of an Automa- 
ton of Sampling-Diagrams (ASD), a Finite-State Machine that reads a sequence of 
samples in C , combined with a sequence of tree-swap decisions. The rationale for 
this choice of input is that together, the sequence of tree-swap decisions and sam- 
ples determines the output path returned by Bi-RRT uniquely. This also makes our 
analysis robust to any (possibly adversarial worst-case) tree-swapping strategy. Af- 
ter laying out the necessary formalism in this section, we will construct an instance 
of this type of automaton in Section 31 in which an accepting state is reached only 
if the Bi-RRT algorithm returns a solution of particularly low quality. 

Definition 1. [A Sampling Diagram] A sampling-diagram D^ over the configura- 
tion space C is a partial, possibly overlapping subdivision ofC into three subsets, 
Fi{D(j), F2{D(j), R{D(j) <Z€ . We require R{D(j) to be disjoint from Fi{D(j) and 
FiiDo). 

Definition 2. [Automaton of Sampling-Diagrams (ASD)] An ASD A is a Finite- 
State Machine that reads the composite infinite alphabet (6 x O) for 6 £ {Ti,T2} 
and O £C . Each state s G States(A), corresponds to a sampling-diagram D(j[s], 
which determines the transition function of A as follows: 

• If A moves to a rejecting or an accepting state, it is trapped in it. 

• For any other state s and input character (6,(7): 

1. If O G R{D(y[s\), A moves to a rejecting state. 

2. If6 = Tj and (J G Fi(Dcy[s]), A moves "forward" to a non-rejecting neighbor- 
ing state, denoted by Ni{s). Ni (s) and N2{s) can be the same state. 

3. Otherwise, A remains in s. 
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Fig. 2 An illustration of Aa, 
an Automaton of Sampling- 
Diagrams (ASD) for the 3^a 
Promenade problem with a = 
2, using the £1 norm. The 
transition function is set by 
samples hitting the sampling- 
diagrams of each state. When 
a sample hits the F; region 
while the tree Tj is being 
extended, A^ moves forward 
along the transition edge la- 
beled Tj, whereas hitting R 
moves Ac to rejecting state 
(omitted for clarity). The fig- 
ure is best viewed in color. 



Given a sequence of tree-swap decisions ©,„ — (0i , 62, ■ • • , ©m) £ {7i , 72}'" together 
with a sequence of samples E„, — (ai , a2, . . . , CTm), we say that A reads ©„ and £,„. 
The sampling-diagrams encode the relation between the tree-swap decisions ©„, the 
samples E„, and the automaton transition function, where F and R stand for moving 
"forward" to a non-rejecting state, or moving to a rejecting state, respectively. This 
formalism can be generalized (see Section|6ll. 



3.1 Relating ASDs to Bi-RRT - Swath-Realizing ASDs 

The following definitions relate the Automaton of Sampling-Diagrams, the tree- 
swaths created by Bi-RRT, and the combination of amples and tree-swapping deci- 
sions that make up the input to botll^. 

Definition 3. [Probability Space] Let E,„ := {gi,G2, . . . , cJm) be a sequence of m 
samples chosen independently at random from F, a probability space over the con- 
figuration space C . Fhe probability space of all such sequences E,„ is F"'. 

Definition 4. [Induced Swaths] Let S\ and S2 be the swaths generated by running 
Bi-RRT, with the tree-swapping decisions 0m G {7i, 72}'", used to decide which tree 
to extend in each iteration, and the samples Em G F'". Si and S2 are the Induced 
SwathsofQm andEM, denoted S\[Qm,E,„] and S2[Qim^m]- 



' In order to technically simplify the definitions, we let Bi-RRT run after returning the output 
solutions, but without ever changing the tree-swaths, thereby not affecting the analysis. 
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Definition 5. [Swath Diagram] A Swath-Diagram A over the configuration-space 
C is a partial subdivision of € into two disjoint subsets, A'^ , A^ CC, 

Definition 6. [Realizing Swath-Diagrams] The pair of swaths (81,82) realizes the 
pair of Swath-Diagrams {A\,A2) if: 

1. A'^ ^ <j) ^ Si nAj^ ^ (j) (intersect positive regions), 

2. Si n A[' = (j) (avoid negative regions), 

3. A^ nA^ j^ (j) ^ 8inS2r]Al nA^ ^ (j) ( connect trees). 

Swath-Diagrams encode critical configuration-space regions that the tree swath 
must intersect (A^) or avoid (A^), and regions where the two swaths connect 
{A^ n A2). The latter implies that the motion-planning problem is solved. 

Definition 7. [Swath-Realizing Automaton of Sampling-Diagrams] We associate 
each state s of an A8D A with a pair of swath-diagrams Ai{s) and A2{s). A is a 
Swath-Realizing ASD if: 

• The pair of induced-swaths {8i[0m,^m],S2[0im^m]) realizes the pair of dia- 
grams (^A\{s),A2{s)), for any input 0m G {Ti,T2}'" and Xm G F'" that moves 
A to a regular (non-rejecting, non-accepting) state s, for any m. 

• The Swath-Diagrams of accepting states must be realized when first visited by A. 

The definition of a Swath-Realizing ASD requires that if we run Bi-RRT and ad- 
vance the ASD in parallel, using the exact same input, then the swaths created 
by Bi-RRT will invariantly respect the rules associated with the current automaton 
state. This reduces the problem of analyzing the progress of the Bi-RRT algorithm 
on a given input to the inspection of the swath-diagrams associated with the current 
automaton state, which encode the desired expansion of the tree swaths. 



4 Proving a Probabilistic Bound for the Promenade Problem 

In this section we prove the existence of a Swath-Realizing ASD Aa for the Prom- 
enade family of problems ,^0;, in which the swath-diagrams of accepting states 
encode low-quality solutions and in which only the swath-diagram of the rejecting 
state allows high-quality solutions. An instance of A^ for a = 2 is illustrated in Fig- 
ures|2]and[3] By carefully analyzing the transition probability between states in the 
automaton, we shall lower-bound the probability to return a low-quality solution. 

Definition 8. [< /i-Quality Solution] For < /i < 1, a solution path (Oto a motion- 
planning query (Q,qi,q2) is said to be < ^-Quality ifQ{(o) < nQopt, where Qopt is 
the optimal quality for the query. 

Consider the quality function Q that is inversely correlated with the length of a 
solution path ft) under the ip norm. Formally, Q{(ii) ~ •, 4- — ^. We define the 

following key configuration-space regions (see Figure[Tl). Ai and A2 are the isosceles 
right triangles at the bottom-left and bottom-right corners of (T free, respectively. B\ 
and B2 are 1x1 squares adjacent to the top-left and top-right corners, respectively. 
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Fig. 3 The swath-diagrams 
associated with A^, the 
automaton shown in Figure|2] 
The growth of the tree swaths 
Si (dark-green line) and 52 
(light-blue line), induced 
by running Bi-RRT on the 
same input given to A^, is 
illustrated. A^ is Swath- 
Realizing, as it maintains 
the invariant that each S, 
intersects A^, avoids Af' and 
connects to the other tree 
whenever A^ nA2 7^ (j)- The 
figure is best viewed in color. 



Definition 9. [Type-A and Type-B paths] Type-A solution paths to £^a intersect 
Ai but not Bi. Type-B solution paths intersect both Z?i and B2- Q*a '^'^^ Q*b ^''^ '^^ 
optimal qualities for any Type-A or Type-B paths, respectively. 

Type-B solutions topologically correspond to paths going above the obstacle (see 
Figure [T]l. For given configuration q\ and 172, let jx be defined as: /j = ^ . It is easy 
to position 171 and q2 such that Type-B solutions are < jU-Quality for a small ji. 

Theorem 1. [Path QuaUty in the Promenade Problem] There exists a constant 
Co > 0, such that for any < jU < 1, any a > 2, any tree-swapping strategy and 
using samples drawn from F™. there exist initial and goal configurations q\ and qi 
for which the path (0 returned by Bi-RRTsatisfies: 

Pr[(0 is < ;U-Quality] > cq . 

The proof of Theorem[T]is involved and proceeds in several stages. We first show 
in Theorem l2l (Section 14 . II ) that accepting states of the automaton Aa correspond 
to type-B solutions and that any type-A solution moves Aa to a rejecting state. The 
heart of this proof is in Proposition [T] where we show that Aa is swath-realizing. 
The probability to reach an accepting state (low-quality solution) is analyzed in 
Theorem[3](Section l4.2l ). All this applies to | < /i < 1 and to the £1 norm (as it is 
easiest to illustrate). The extension to any jj. > (paths of arbitrarily low quality) 
and to other £p is discussed in Section l431 
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4.1 Aa Accepts Inputs That Induce < /i -Quality Solutions 

Theorem 2. For all a >2, for all qi &C free enclosed between Ai and Z?i (on the 
left-hand side ofC ), and for all q2 £ C free enclosed between A2 and B2 (on the 
right-hand side ofC ), the automaton A^ (as in Figures\2\and\3} moves to: 

1. A rejecting state on all inputs for which Bi-RRT outputs a type-A solution. 

2. An accepting state only on inputs for which Bi-RRT outputs a type-B solution. 

Proof. First, it is easy to verify that Ka is a proper ASD by Definition[2l In Proposi- 
tion[T|below we show that h.^. is Swath-Realizing. The proof of the Theorem follows 
immediately: 

Part (1): Ai, the isosceles right triangle at the bottom-left corner of the prome- 
nade (see Figure[T]l, is defined as the A^ (s) and A2 (s) regions for any non-rejecting 
state s in Aa. Observe that high-quality (type-A) solution must cross Ai . Assuming 
Aa is a Swath-Realizing ASD, it must move to a rejecting state on all inputs for 
which the induced swath intersects Aj . 

Part (2): By construction, the diagram regions A^ and A2 intersect each other 
in any accepting state of A^. Assuming A^ is a Swath-Realizing ASD, the induced 
swaths Si and ^2 necessarily cross each other upon moving to an accepting state (see 
Definition |6]), and contain a solution to the motion-planning problem. From Part (1), 
this can only be a type-B solution. D 

Proposition 1. Aa is Swath-Realizing. 

Due to space considerations, we include part of the proof details as supplementary 
on-line material [19] , and only give a sketch of the proof here. 

Sketch of Proof. Consider the swaths Si and ^2 induced by input to Aa. First, the 
swath-diagram regions A^ (s) and A2 (s) are defined as the Ai region, the isosceles 
right triangle at the bottom-left corner of the promenade (see Figure [T]i. Since Ai 
is clearly the intersection of a half-plane with C , Ai forms a "visibility block" in 
the configuration space. Formally, any s,t ^C , s,t ^Ai lie on the other half -plane, 
and therefore the segment between s and t does not intersect Aj. By construction 
of Aa (as in Figure |2l no sample hits Ai until Aa reaches a rejecting state, because 
the sampling-diagram rejecting region R{D(y[s]) is also defined as Ai for all regular 
(non-rejecting, non-accepting) states. Since also qi,q2 ^ Ai, we conclude that the 
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Fig. 4 Key regions in the Promenade configuration space. The di- 
rected left-path in the state graph of the automaton Aa (see Fig- 
ures [2] and ^ forces the tree rooted at qi to intersect Ai , then 
A2, then A3, reaching 52 through a type-B solution (above the ob- 
stacle). These key regions are geometrically designed to be suffi- 
ciently close to each other, such that Bi-RRT will connect them in 
this order. 
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swaths ^i and ^2 produced by Bi-RRT may not intersect Ai, which is equivalent to 
zif (5) and 42^(5), for any non-rejecting state s of A^, as required by Definition |6l 

Second, we need to show that whenever we move to a state s, the swaths ^i and 
^2 induced by running Bi-RRT intersect the A^ and A2 regions, respectively, if 
those are not empty sets (Definition [6]). We proceed by induction on the length of 
the input. At the initial state, A^ — (j) and A2 —(j),so the induction hypothesis holds 
trivially. If we move from any state s to itself, the induction hypothesis holds since Si 
can only expand further, and by the induction hypothesis, 5,- already intersects A^^ [s] 
before reading the next input. We now prove the induction for each transition in the 
left- most directed path in the state graph of A^, as illustrated in Figures [2] and [3l 

^init ^ ^^i ^ ^iii ^ ^acceptl ■ 

In each of these transitions, we extend the tree Ti and its swath Si . Other directed 
paths, which correspond to alternative tree-swapping decisions, are analyzed by sim- 
ilar argumentation, and are omitted due to space considerations. The first transition 
Sinit -^ Si occurs when we extend Si and a new sample a hits Ai, a small 5 x j 
square in the top-left corner of C free (see Figure H]). This transition is encoded in 
the fact that Ai is the "forward" region Fi of the initial state sampling-diagram 
D(j[simt] (FigureO. Ai is also the positive swath-diagram region of the next state, 
Aj^[i',] (FigureO, so for the induction hypothesis to hold, Si should intersect Ai after 
Bi-RRT runs on the sample a G Ai . Indeed, the position of the region Ai is designed 
to guarantee that any a e Ai is visible from its nearest-neighbor in Si , and there- 
fore will be added to Si by the Bi-RRT algorithm by Observation[T|in Section|2]l. A 
detailed proof of the last claim is given as on-line material fl9. Lemma 7(i)]. 

The second transition 5, -^ sm is designed to bring the swath Si closer to the 
top-right corner region of the promenade (52). The transition occurs when a new 
sample a hits the "forward" region Fi{D(y[si]), a small triangle A2 located at the 
top edge of the promenade (Figure |4ll. Again, the triangle A2 is designed such that 
a G A2 will be visible from its nearest neighbor in the swath, and therefore would be 
added to the swath Si , making it intersect the positive swath-diagram region A^ [sm] . 
A complete proof is given in the on-line material |19, Lemma 7(ii)], based on the 
relative position of the triangle A2 to the square Ai , which Si intersected already at 
the previous state. Indeed, the rationale behind the position of Ai is also to provide 
visibility for Si towards A2. 

The last transition sm -^ Saccepti, brings Si to a ^ x ^ square A3 at the top-right 
corner of the promenade (B2). Note that any point in A3 is visible from q2 (Fig- 
ure |4li, and T2 and Ti will now be connected, completing a type-B solution path 
to the Promenade problem (see lines 8-12 in Algorithm [T]l. The detailed proof for 
this transition is found in [il9i . Lemma 7(iii)]. In a similar way, each of the re- 
maining transitions in A^ can be shown to preserve the invariant that the swath 
intersects the favorable (Aj^) regions of the swath-diagrams, and that Ti and T2 
are connected whenever A^ HAj^ 7^ <j), as required by Definition |6l hence A« is 
swath-realizing. D 
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4.2 Probabilistic Analysis of A^ 

Theorem 3. If O & F has positive probability to hit the Fi or F2 regions of the 
sampling-diagram D^ [s] for any regular (non-rejecting, non-accepting) state s G 
Aa, then there exists cq > such that for any m > Mq, for any tree-swapping 
strategy: 

/"r [Accept] > cq . 

Where PrfAccept] is the probability of K^ to reach an accepting state after reading 
the tree-swapping decisions 0„, G {7i , F2}'" and samples E„ G F'". 

Proof. [Theorem |3) Assume that the current state of A^ is a regular state s, and 
that the next tree-swapping decision the automaton reads is T,. After reading the 
next sample, the automaton A^ either: (i) Hits the Fi region of D^ [s] and moves 
forward to another non-rejecting neighboring state Ni{s) (event ^i{s)), (ii) hits the 
R region and moves to a rejecting state (event d\{s)) or, (iii) remains in s. Define (i) 
and (ii) as critical events. 

Denote by vo\r{Fi) and volr{R) the volume of the Fi and R regions in Z)cj[5], 
weighted by the probability distribution of F, The conditional probability Ki{s) to 
move forward from s to Ni{s), given that a critical event happened, is: 

The probability is positive since we assumed that F assigns positive probabilities to 
hit Fi{D(j [s]) for all non-rejecting states. Aa is finite and has a directed-acyclic state 
graph (if we do not consider self-loops) in which the end nodes are the accepting 
and rejecting states, and serve as traps. Therefore, given sufficient input size, Aa 
would eventually either: (1) End up in an accepting state by moving at most k steps, 
k being the length of the longest directed path in the state graph, or (2) end up in a 
rejecting state. 

Let {1//,} be the finite set of all directed paths without loops from Si„ii to any 
accepting state. If k— 1 , then the conditional probability to move forward from the 
initial state Si„it to an accepting state, regardless of the tree-swapping strategy taken 
along the way (whether / = 1 or ; = 2 in each iteration), is bounded from below by 
min[ni{sinit) , ^li^mit)] > . Let n{s\\i/i) denote the conditional probability to move 
forward from any state s to the next state in i//,-. (as in Equation[T]). By induction on 
the length k of the longest directed path, and since the events are independent, the 
probability of A« to end up in an accepting (rather than a rejecting) state is bounded 
from below by ci , the minimal product of conditional probabilities to move forward 
over the states of i//,. Formally: 

ci = miny,,gp[]^ ^{s\Wi)] > • (2) 

Let Mq be the minimal input length that guarantees /"r [Accept V Reject] > 1 — £ for 
some small e > 0. Then for cq = (1 — e) • ci, for any m > Mq: 
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Pr [Accept after m iterations] > cq > . (3) 

D 

Explicit calculation of cq, the probabilistic bound to get a < /i -Quality solution, 
leads to the extremely small number of approximately 4 x 10^*' for the uniform 
distribution, which is the product of transition probabilities ( 19.1949.37 ); based on 
the ratio between the area of f, and FjUR in each transition. Importantly, this 
bound is true for any a > 2, since this ratio remains fixed, and does not de- 
pend on the tree-swapping strategy. In the next section we show that our bound is 
conservative. 

As a side-benefit of our analysis, we can bound Mq, the minimal number of iter- 
ations it takes A^ to end up in an accepting or rejecting state, by considering also 
the probability to remain in the same state. This probability is calculated from the 
probability-weighted volumes of the Fi , F2 and R regions, which depend also on the 
edge-size a of the inner-square and on the probability distribution F. The analysis 
can be further extended to bound the time it takes Bi-RRT to return a valid solution. 
We leave the explicit calculation outside the scope of the current manuscript. 

4.3 Extensions to General ip and Arbitrarily Small ji 

If configurations 171 and q2 are proximal to the bottom-left and bottom-right corners 
of the inner square, respectively, then /i reaches ^ . In order to generalize the proof to 
arbitrarily small /i, we need to position ^2 at the bottom side of the promenade, next 
to Ai in the bottom-left corner (as in Figure|5]l, turning Type-B solutions arbitrarily 
longer than the optimal path through Ai. The appropriate automaton requires the 
addition of a few more states for crossing the bottom-right corner of the promenade 
(A2 region in Figure [T] which now becomes a type-B region). This can be done in a 
straightforward manner analogously to the crossing of the top-right corner in A^ ; the 
details are omitted due to space limitations. In addition, we showed Theorem[3]only 
for the £1 norm, as it was easiest to draw the diagrams for its automaton. However, 
the proof does not rely on specific properties of £1 and can be generalized to any ip 
for 1 < /? < 00, and to the Euclidean norm £2 in particular. 



5 Experiments on the Promenade Problem 

We compared our results to a widely used implementation of the Bi-RRT algorithm, 
using the OOPSMP software for motion-planning [21] (version 1.2/3). We empir- 
ically tested the performance of the algorithm on an instance of the Promenade 
problem with a = 4. In each experiment, we conducted 5000 independent runs of 
Bi-RRT followed by standard path smoothing. 

In the first experiment, the initial and goal configurations were set at opposite 
sides of the promenade, next to the bottom-left and bottom-right corners of the inner 
square (as in Figure [T]l. In this case, in as much as 49.4% of cases (±1.1% std- 
err) the output path generated was at least three times worse than optimal. This 
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striking result means that our theoretical bound is fairly conservative, and the actual 
probability of Bi-RRT to produce low-quality paths is higher. This may indicate that 
when the initial and goal configurations are positioned this way, the actual bound 
for finding a low-quality paths is 0.5 ± 5 for some small < 5 << 1, where 5 goes 
to zero as the initial and goal configurations reach the corners. 

In a second experiment, where the initial and goal configurations were set right 
next to each other, but such that the bottom-left corner occludes their mutual vis- 
ibility (Figure|5]), a near-optimal path was generated in 94.1% (± 0.3%) of cases. 
However, in the remaining cases a path that is over 140 times worse than optimal 
after smoothing was selected (Figure l5Bl l. even though the start and goal configu- 
ration were in absolute proximity to each other and were not separated by a narrow 
passage. See Section[6]below for experiments in a 3D scene. 



6 More Complex Settings 

To give a completely worked out and proved example, we used a very simple family 
of toy scenarios. In this section we point out where we believe the method can be 
extended to more complex settings vs. where difficulties lie in such extensions. 

Several disconnected blocking zones. In the promenade ,^(z> we can block 
high-quality solution paths with a single triangle (Al in Figure [l]). This simpli- 
fies the calculus a lot but does not apply in more complicated examples. We can 
then think of this setting as a sub-problem within a bigger problem. For instance, 
the analysis follows through also when we replace the single blocking-zone pred- 
icate with more complicated predicates describing several (disconnected) blocking 
zones (Figure [6Al l. In this case, the automaton reaches an accepting state after we 




Fig. 5 Experimental results. [(A)] Instance of the Promenade problem for a = 4. The initial 
and goal configuration q\ and q2 are proximal, such that their mutual visibility is blocked by 
the left-bottom comer. The merged trees (merge point marked by arrow) are shown. [(B)] The 
solution path extracted from the trees in |(A)| is over 140 times worse than optimal, even after 
standard path smoothing, as typical in 5.9% of independent runs. |(C)] Overlay of represen- 
tative output paths for the 3D cube-within-a-cube problem. The edge-size of the inner-cube 
(orange) is 90% that of the bounding-cube (not shown). A low-quality path was returned in 
97.3% of cases, even after path smoothing (see Section|6]l. 
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Fig. 6 More complex settings: [(A)| Maze with two disconnected blocking-zones (red trian- 
gles), which prevent good quality solutions (green and gray lines), forcing a path with much 
lower quality (dashed blue line). |(B)] The Promenade sweeped by £3 in the z-direction. |(C)| 
cube-within-a-cube. 



sample within some (low-quality-path) zones in the appropriate order {F regions 
of a sampling-diagram), while avoiding the blocking zones, which move the au- 
tomaton to the rejecting state {R regions). This type of problems illustrates the sen- 
sitivity of tree-based planners to a sequence of critical events, and may justify a 
heuristic approach we presented recently for improving path quality, by hybridizing 
sub-solutions from multiple independent runs |5, 22]. 

Higher dimensions. We consider two extensions of the Promenade problem 
to higher dimensions. The first extension is by a cross-product with a hyper-box 
[0, £3] X [0, £4] X • • •. In 3-space this amounts to sweeping the two-dimensional scene 
in the direction of the third axis along a segment of length £3 (Figure l6Bl l. If each 
dimension of the hyper-box is sufficiently small, the analysis seems to hold almost 
verbatim. Another possible extension is the "(hyper)cube-within-a-(hyper)cube" 
(Figure I6CI 1. According to our initial analysis, it may no longer be true that the 
probabilistic bound is uniformly independent on the edge-size a of the inner 
(hyper)cube. However, if we are just interested in crafting an ASD to provide a 
probabilistic bound for a specific instance of the problem, the technique seems to 
allow it if we define the appropriate critical regions, although the exact details must 
be worked out and proved. Indeed, in experiments we conducted on this problem 
where the edge length of the inner-cube is 90% that of the bounding-cube edge 
(Figure lSCl l. the quality of solution paths was low (over 1.2 worse than optimal and 
typically much worse) in 97.3% ± 0.4% of runs (from a total of 2, 000 independent 
runs). 

Other measures of path quality. Analogous ASD's can be constructed for other 
common quality measures such as clearance from obstacles or combined weighted 
clearance-length measures. In particular, it seems that the case of average clearance 
or bottleneck clearance, where we wish to maximize the minimum clearance along 
a path, can be addressed by a simple adaptation of the current analysis, if we reduce 
the clearance of any Type-B solution (Figure[T]) by narrowing down the width of the 
top passage in the Promenade scene, or alternatively, if we block sampling near the 
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medial axis of the free space. The approach that uses the medial axis may lend itself 
more easily to extensions to higher dimensions. 

Other motion-planning algorithms. The essence of the automaton formulation 
is the dual representation of motion-planning algorithms. This representation re- 
duces the analysis of path quality to the identification of a set of critical events in 
the progress of such algorithms. We therefore anticipate that our method can ease 
the quality analysis of other motion-planning algorithms. In particular, it would be 
interesting to analyze other variants of RRT, as those suggested by Karaman and 
Frazzolil 10], along the lines presented here. 

Richer automaton states and alphabets. Another way in which the Automa- 
ton of Sampling-Diagrams framework may be generalized to higher-dimensional 
configuration spaces is by using different types of automaton states or alphabets 
than the ones presented here, which will assist further to effectively identify crit- 
ical events during the construction of roadmaps, by taking advantage of a richer 
set of predicates that can be used to analyze complex problems. In this respect, it 
may be helpful to borrow from the work on motion-planning with Linear- Temporal 
Logic (LTL) specifications, where various types of Finite-State Machines are used 
in conjunction with planning algorithms for the purpose of model checking (see, 
e.g., IlifiQ)- 

7 Conclusions 

In this study, we show for a common variant of the Bi-RRT algorithm that the proba- 
bility for low quality paths is bounded away from zero. To the best of our knowledge, 
we present the first theoretical results on path quality of the Bi-RRT algorithm that 
are topological in nature in the sense that alternative paths lie in different homotopy 
classes and are therefore invariant to smoothing. We prove a wide gap between the 
optimal path, or even any path that is homotopy equivalent to the optimal path, and 
the quality of actual output paths. Our empirical results suggest that our bound is 
conservative, and it would be a worthy challenge to reach a tighter bound. An ad- 
vantage of the automaton formalism is probably that it ignores many of the details 
involved in analyzing the running progress (perhaps at the cost of looser bounds). 

In conclusion, we presented here one of the first theoretical results on the quality 
of output paths generated by sampling-based algorithm and the first to assess high- 
quality paths that occupy a non-negligible portion of the space of all paths. We 
anticipate that analysis of the type we propose here will facilitate the design of 
robust algorithms that also perform better in practice. 
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Sufficient Conditions for the Existence of 
Resolution Complete Planning Algorithms 

Dmitry S. Yershov and Steven M. LaValle 



Abstract. This paper addresses theoretical foundations of motion planning with dif- 
ferential constraints in the presence of obstacles. We establish general conditions 
for the existence of resolution complete planning algorithms by introducing a func- 
tional analysis framework and reducing algorithm existence to a simple topological 
property. First, we establish metric spaces over the control function space and the 
trajectory space. Second, using these metrics and assuming that the control system 
is Lipschitz continuous, we show that the mapping between open-loop controls and 
corresponding trajectories is continuous. Next, we prove that the set of all paths 
connecting the initial state to the goal set is open. Therefore, the set of open-loop 
controls, corresponding to solution trajectories, must be open. This leads to a simple 
algorithm that searches for a solution by sampling a control space directly, without 
building a reachability graph. A dense sample set is given by a discrete-time model. 
Convergence of the algorithm is proven in the metric of a trajectory space. The re- 
sults provide some insights into the design of more effective planning algorithms 
and motion primitives. 



1 Introduction 

We consider the general problem of motion planning under both differential con- 
straints and obstacles. A control system, geometric robot model, and a model of ob- 
stacles in the workspace are given. The task is to compute a control signal that brings 
a robot along a trajectory from an initial state into a goal region in a state space that 
may represent configurations and possibly their time derivatives. This problem is a 
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unification of several fundamental, classical problems in robotics: \) Nonholonomic 
planning: In this case, the differential constraints may arise from wheeled mobile 
robots and planning occurs in the configuration space ||T|; however, dynamics and 
constraints due to angular momentum may also be included. Such problems usu- 
ally arise from underactuated (less controls than the number of degrees of freedom) 
systems. 2) Kinodynamic planning: Here, there are both velocity and acceleration 
bounds, and the system is fully actuated Q. 3) Trajectory planning: This problem 
has been pursued for several decades IS] 111 and typically involves computing an 
open loop control for a manipulator while satisfying the kinematics and dynamics 
expressed as a control system. See Chapter 14 of fSl for a detailed presentation of 
this unified class of problems. 

In spite of all of this effort, there is still no general characterization of the par- 
ticular conditions under which an algorithmic solution exists. Since basic motion 
planning (without differential constraints) is already PSPACE hard |[7J, and par- 
ticular instances of motion planning with differential constraints are even harder, 
there is not much hope for efficient, complete solutions. In this context, complete 
means that the planning algorithm must return a solution whenever one exists; other- 
wise, report failure. Therefore, virtually all approaches to the problem are sampling- 
based, which employ discretization and heuristics to incrementally explore the state 
space by concatenating pieces of control signals to obtain a search tree of collision- 
free trajectories. In this case, the most we can hope for is resolution completeness 
ISl , which means that the algorithm correctly finds a solution whenever one exists; 
however, it may run forever if one does not. This is analogous to classical Turing 
decidability vs. Turing recognizability, which are comparable to completeness and 
resolution completeness, respectively. We believe that having general conditions for 
the existence of resolution complete algorithms may be useful in the formulation of 
solvable robotics problems, in the design of better sampling-based planning algo- 
rithms, in the design of motion primitives ifOl fTOlfm . and possibly for the verifi- 
cation problem lfT2l[T3l , which is a negated form of planning that establishes path 
nonexistence. 



Motion primitives 



Finite length sequences of 
motion primitives 



Input space 



-s- Control space ■ 



State space 



-s- Trajectory space 



Fig. 1 Sets, spaces, and relations 



In this paper, we determine simple, general requirements for the existence of 
a resolution complete planning algorithm based mainly on Lipschitz conditions on 
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the control system mapping. This allows substantial generality and is inspired 
by analysis of the convergence of numerical dynamic programming algorithms 
lfT4lll5l . Our basic approach in this paper is to analyze the relationships between 
the six spaces shown informally in Fig. [T] The input space is the set of possible 
control system inputs and the state space is the configuration space or more gener- 
ally the phase space of the system. Considering inputs and states parametrized over 
time, we design suitable metric spaces for both the space of control signals (called 
the control space) and the space of trajectories (the trajectory space). By estab- 
lishing the continuity of the mapping between these spaces induced by the control 
system, we show that resolution completeness boils down to a simple topological 
condition. Furthermore, we constructively prove existence by providing a resolution 
complete algorithm that systematically enumerates candidate solution trajectories 
by concatenating sequences of motion primitives, which form a discrete set, suitable 
for computation. This is closely resembles the execution trace of most existing plan- 
ning algorithms, which incrementally sample and search the state space (see lfT6l l6l 
for surveys). 

Resolution completeness in this general setting is provided by ensuring that the 
set of all computed control signals is dense in the space of all control signals. One 
surprising observation, however, is that this may be achieved even where it is impos- 
sible to incrementally reduce the radius of the largest empty ball {dispersion ifTTl ) 
in the space of control signals. This peculiar behavior is explained in Section |2l 
where basic sampling concepts are defined. Section [3] formally defines the general 
problem. Section|4]develops a continuous mapping from control space to trajectory 
space by carefully designing appropriate metric spaces. The main algorithmic con- 
structions and theorems are presented in Section[5] which give sufficient conditions 
for the existence of a resolution complete planning algorithm. Conclusions appear 
in Section|6l 



2 Precompactness and Sampling Convergence 

Before providing the main technical results of the paper, a counterintuitive property 
regarding sampling and convergence needs to be addressed. Consider sampling- 
based planning algorithms for the basic motion planning problem (no differential 
constraints). Achieving resolution completeness (or alternatively, probabilistic com- 
pleteness) amounts to assuring that the computed samples are dense in the limit as 
the number of iterations tends to infinity. Intuitively, the sampling resolution gradu- 
ally increases over time. This notion can be nicely captured by defining dispersion 
of a sample set P m& subset of any metric space X |il7i|: 

5(/')=sup{inf{p(x,p)}}, (1) 

in which p is the metric. 
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Typically, the sample set increases gradually during the execution of a planning 
algorithm, and the dispersion converges to zero in the limit. This is formalized 
by considering P as an infinite sample sequence P, which is a set together with a 
specified linear ordering. We say that P converges to X if and only if 

lim5(PU)=0, (2) 

in which P\n denotes the first A^ elements of P. Note that the convergence rate may 
depend on the ordering. 

In the coming sections, we consider notions of denseness, dispersion, and conver- 
gence over the function spaces of control signals and trajectories. For these spaces, 
it might be surprising that there are dense sample sequences for which the dispersion 
does not converge to zero. In other words, the samples may eventually get arbitrarily 
close to every point in the space, even though they are not converging to that space. 
For a simple example of this behavior, let X = R. For any finite set of samples in R, 
the dispersion is always infinite. Nevertheless, any ordering placed on Q, the set of 
rationals, yields a sequence that is densqj in R. For a bounded space, such as 5' and 
the configuration spaces arising in robotics, this behavior does not occur: A dense 
sequence must drive the dispersion to zero. 

Here is a critical question for sampling-based planning with differential con- 
straints: What property must X have to enable convergence? Define X to be pre- 
compact if and only if for any e > there exists a finite cover of X with open balls 
of radius e. The following lemma answers the question. 

Lemma 1. A sequence P that is dense in X converges to X if and only if X is 
precompact. 

Proof. We prove necessary and sufficient conditions separately. 

Necessary condition: Assume P is convergent, and let e be greater than zero. Find 
Ne such that d{P\N^) < £■ Consider a set of open balls of radius e centered at points 
of P\n^. From the definition of dispersion, it follows that this set of balls is a finite 
cover of X. Since e is arbitrary, X is precompact by definition. 

Sufficient condition: Suppose, on the other hand, that X is precompact. Thus, for 
any e > 0, there exists a finite cover of X with open balls of radius e. Denote these 
balls as B,. Since the sequence P is dense, the intersection of P with each fi, is 
nonempty. Take the smallest A^^ such that P\n^ has at least one element in each B,. 
By construction, the dispersion of P\ni. in X is bounded by 2e. Since e is arbitrary 
and for A^ > Ng we have 5{P\n) < 5{P\nc), the sequence converges. D 

Note that this is a property of the space X, and not a particular sample sequence. 
Our analysis will demonstrate that under general conditions space of all control 
signals is not precompact. Therefore, it is impossible to achieve convergence in the 
control space. However, we will show precompactness of the space of finite-time 
trajectories. In this case, the denseness implies convergence, which means that if a 



' Here, dense means that the topological closure of Q is all of 1 
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solution is not found in a finite number of steps, it either does not exist or the goal 
set must be smaller than the reached dispersion of the sampled trajectories. 

3 Problem Definition 

Let the state space, X C M™, be a smooth manifold of dimension n. Let U be the 
input space, which is a compact subset of K*^ with k<n. A given mechanical system 
is expressed in local coordinates a^ 

x = f{x,u), (3) 

in which x = dx/dt. Also, in the equation above, x E X and u eU. 
It is assumed that / is a Lipschitz continuous function in both x and u, which 

implies that there exists positive real-valued constants L^ and Lj, such that 

||/(x,M)-/(y,M)|| <L,||^-x'|| and \\f{x,u) ~ f{x,u')\\<U\\u-u'\\ (4) 

for all x,x' £X, and u,u' £U . The norms used here are defined on the ambient 
vector spaces W" and M*^, respectively. Furthermore, it is assumed that there exists 
M > such that 

||/(x,m)||<M (5) 

for all X G X and u E U. 

Let ^ be the set of all measurable functions, m, defined on [0 , T], for all 
T G [0, oo), with values in U. Similarly, denote ^ to be the set of all Lipschitz 
continuous functions, x, defined on [0, T], for T > 0, with values in X. We re- 
quire that for all functions in ^ the Lipschitz constant is bounded by M. In other 
words, for any element x € J^ and any given t and f ' in the domain of x, we have 
\\x{t) -x{t')\\ < M\t - 1'\. Also, define T : '2< U J^' ^ [0, oo) to return the duration 
of a control or a trajectory, depending on the argument. 

Constraints are imposed on X that account for mechanical limits due to kine- 
matics and dynamics, and also to avoid collisions with static obstacles. Let X^ee 
denote an open and bounded subset of X that consists of all states satisfying these 
constraints. Usually, Zfree is defined only implicitly via representations of the kine- 
matics and obstacles. Therefore, a collision detection algorithm is often needed to 
evaluate whether states lie in Xfiee. 

The planning problem is as follows. Assume X, U , and / are given. Furthermore, 
an initial state, xi G Xfrte, and open goal set Xg C Xfree are specified. The problem is 
to compute m G ^ such that for the corresponding x, satisfying (O with given H, the 
following is true: 1) x{Q) ~ xi, 2) x(t(m)) G Xq, and 3) the image of x lies in Xfree- 

To accomplish this task, we assume the existence of an integration module which 
integrates ^ to produce trajectory segments, and a collision detection module 
which determines whether a trajectory segment lies entirely in Xfree. 



^ We may consider more general case of time-varying systems i = f{x,ii,t), without 
changing further analysis in the paper. We choose the time-invariant case for notational 



convenience. 
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4 Preliminary Concepts and Properties 

In this section we introduce some preliminary concepts that are necessary to estab- 
lish basic properties for an algorithm to be resolution complete and convergent. We 
define metrics on the space of controls and the trajectory space, and show that the 
relation between controls and trajectories is a well-defined continuous function. 



4.1 Designing Metric Spaces on ^ and JT 

The control space, ^, can be made into a metric space as follows. Let a be the 
diameter of the smallest ball that contains U . We call a the diameter ofU.lt is easy 
to verify a <°° because U is assumed to be compact. For two controls u and u' in 
'W define the Li-type metric 

pop^{u,u')^ ||M(f)-M'(r)||dr + a|T(M)-T(M')| , (6) 



in which T = min(T(M),T(M')). 

Note that this metric is different from a standard Ly metric due to variable domain 
length of functions in "^ , which is accounted by an additional term in (O. The extra 
term separates any meaningful controls from the control defined on the zero length 
time interval, we call it zero control. 

The choice of the metric is motivated by Figs. |2a] and |2b] Consider driving a 
car around a comer. The trajectory deviates only slightly if steering is applied with 
small delays. Moreover, the trajectory deviation depends on the delay continuously. 
The introduced Li-type metric ^ captures this behavior. 





t(«') t(«) R xi 



Fig. 2a Control signals. The area of the Fig. 2b Corresponding trajectories 

shaded regions corresponds to p^ (m, u') 



To save space, we omit the (tedious) proof of the following lemma: 
Lemma 2. The control space '^ is a metric spac^ with respect to Q. 



^ More precisely, it is a pseudometric space f_18J because p^^/{u,u') = for some it ^ it'. 
However, if two controls are identified in case their distance is zero, then the resulting 
space is a metric space. 
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Next we describe a metric on the space of all trajectories, 2^ . For two trajectories x 
and i' define the Lo„-type metric 

p^;(x,x')= sup |||x(0-x'(0|||+M|T(x)-T(f')l, (7) 

0<r<r '- -' 

in which T — min(T(i), t(-?'))- 

On the trajectory space, an Lj-type metric cannot be used because it ignores 
"spikes" in the trajectory deviation. The two trajectories illustrated in Fig.[3alare 
"close" in terms of the Li-type metric. However, they exhibit qualitatively differ- 
ent behavior; the first one does not intersect the obstacle and the second one does. 
Moreover, it is possible to find two trajectories arbitrary close in Li-type metric, but 
with arbitrary large deviation between them. On the other hand, later we will show 
that two trajectories which are "close" in the L^o-type metric behave similarly; see 
Fig.Ebl 



XI 



^© 



y! 



Fig. 3a Li -type metric Fig. 3b L^-type metric 

Lemma 3. The trajectory space J?^ is a metric space with respect to (0. 
(The proof is omitted to save space.) 

4.2 Relating Controls to Trajectories 

Next, we analyze the mapping between a control fi G ^ and the corresponding 
trajectory originating from some xq €X. Denote the mapping as a function x{xo , U) : 
X X ^ — *■ ^. Note that for fixed m and xq, the trajectory x(jco,m) is a function of 
time and it satisfies the integral equation 



x{xo,a,t)=xo+ / f{x{xo,ii,s),a{s)) 
Jo 



d^ , (8) 



which is equivalent to Q. In the integral equation, x{xo,ii,t) denotes the point of 
the trajectory i(xo, m) at time t. 

Lemma 4 (Well-defined trajectories). For any initial state xq ^ X and any control 
signal a e %", the trajectory x{xq,u) belongs to 3^' . 

Proof. For any given S G '^, function /(x, M(f )) as a function of x and t satisfies the 
Caratheodory conditions. Hence, the solution for the differential equation (O, with 
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initial value at any xq, exists, is unique, and is absolutely continuous on [0, t(m)]. 
We now use the integral equation (O to prove thatx(xo, u,t) is Lipschitz continuous 
as a function of time. Consider 



x(xo,M,f) — x(xo,M,f') < / /(x(xo,M,i'),M(5))di' 



<M\t-t'\ 



(9) 



Note that the Lipschitz constant is bounded by M\ therefore, x{xq,u) is in ^. D 
To show the continuity of x(xo, S), we first prove the following lemma. 

Lemma 5 (Bounded trajectory deviation). 

Let u and u' be two independent controls, with x{ii) = t(m') = T. Assume further 
that X = x{xq , m) and x' = x(xq , u!) are the corresponding trajectories. The deviation 
between trajectories x and x' at any time f £ [0 , T] is bounded by 

\\x{t) -x'{t)\\ < (llxo-xj)!! +L„y exp{-L,s)\\a{s)-u'{s)\\ds^ exp(L,.f) . (10) 

Proof. Trajectories x and x' satisfy the integral equation ^, with U and m', respec- 
tively. Using the integral equation and the Lipschitz continuity of /, we derive the 
integral inequality for the trajectory deviation 



l|x(r)-.?'(r)|| 



XO - Xq + 



d^ 



< ||xo-x;,||+ ['L,\\u{s)-u'{s)\\+LMs)-x'{s)\\ds . (11) 
Jo 

From the integral form of Gronwall-Bellman inequality ||T9ll a bound on the 
trajectory deviation at any time / 6 [0 , T] follows as 



\\x{t)-x{t)\\ < ||xo-Xo||exp(L,?) 



\\u{s)-u{s)\\ds 



+ L,L, [ rexp{L,{t-r))\\u{s)-u'is)\\dsdr . (12) 
Jo Jo 

The double integral is reduced to a single integral by applying Fubini's theorem ifTSl 
to obtain 



Jo 



exp(Lv(f - r))\\i2{s) - ii'{s)\\ ds dr 



= exp{Lx{t ~ r))\\ii{s) — il'{s)\\drds 

= -i ['\\u{s)~u\s)\\{expiL,{t~s))^l)ds (13) 

kx JO 

The combination of the results above finishes the proof. D 
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The next theorem establishes the continuity of the function x{xq,u) with respect to 
both XQ and u. 

Theorem 1 (Continuity of x(xo,m)). The mapping x{xq,U) is continuous. 

Proof. Take two initial points xq and .Tq in X and two control signals U and U' in 
'^. Letx andi' be defined as inLemma|5] and let T* — min(T(x),T(i')). It follows 
from Lemma|5]that for all f £ [0 , T*] 

\W} -x{t)\\ < (||xo-^|| + L, J' txp{-L,s)\\a{s) - i/{s)\\ds'^ exp(L,0 . (14) 

Take the supremum over the interval [0 , T*] on both sides and derive 

sup \\x{t)-x'{t)\\<exp{L,T*)(\\xo-x'o\\+LJ \\u{s) - r/{s)\\ds) . (15) 

0<r<7'* V 70 ■' 

Using the derivation above, we bound the distance between trajectories in terms of 
the distances between initial values and control signals 

P3r{x,x)= sup \\x{xQ,a,t)-x{xQ,u\t)\\+M\T-T'\ 

0<r<7'* 

/ r'^* \ M 

< exp(L,r*) ||xo -^11 +L„ / \\u{s) - u'{s)\\ ds)+—a\T- T'\ 
\ Jo ) a 

<exp(L^r*)||xo-Xo||+maxfL„exp(Lvr*),— JPo2/(m,m) . (16) 

The inequality above proves continuity of the map. D 

Since X and ^ are metric spaces and x(xo,m) is continuous, we may employ 
topological methods to analyze properties of subsets and sequences in control and 
trajectory spaces, as well as relations between them. 

43 Topological Properties 

We next address properties of the set of collision-free paths connecting xi with Xq. 
Consider ^, the subset of ^ containing all trajectories that originate from xi. 
Define the induced subset topology on ^. Also, consider the set <^oi of solutions 
to the path planning problem, which consists of all paths in ^ that originate from 
X[, terminate in Xq, with the image contained in Zfiee (collision-free). Note that an 
element of ^oi may not be necessarily a trajectory governed by the system ^. 

Theorem 2. IfXf^gf. andXQ are open in X, then ^oi is an open subset of 3fy\. 

Proof. Let x be in ."^soh and let T ~ t{x). According to the definition, the image of 
X is contained in Xfree and the terminal point, x(r), is in Xq. Since the image of x is 
compact and the complement of Xfiee is closed, the distance between these two sets. 
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5i , is well-defined and strictly positive. Moreover, the distance from the terminal 
point to the complement of Xg, ^2, is also well-defined and positive. 

Consider 5 = min(5i , ^2) and any trajectory x' from ^ such that p,f (i,i') < 5. 
First, we show that the image of x' is in Xfree. Assume to the contrary that there 
exists some f' g [0, t(x')] such thati'(f') ^ Xfj-ee- Letf — min(f',r). The bound 

\\m-^'it')\\<\\m-^'m + \\^'it)-^\'')\\ 

< ||jc(f)-x(f)||+M|r-f'| 

<p^ix,x')<5<di (17) 

contradicts the definition of 5i . 

Second, we prove that .f'(r') G Xq, in which T' = t(.t'). Assume to the contrary 
that x'{T') ^ Xa- Let T* = mm{T, T') and consider the bound 

UT) ~x'{T')\\ < \\xiT) -x(r)|| + UT*) ~x'iT*)\\ + \\x'iT*) -.f'(r')ll 
< M\T- T*\ + \\x{T*) -x'(r*)|| +M\T* - T'\ 
^\\x{T*)-x'{T*)\\+M\miix{T',T)-mm{T',T)\ 
<p.tix,x')<5<52, (18) 

which contradicts the definition of ^2. By definition, x' belongs to ^soi- □ 

Denote a set of solutions to the motion planning problem as '^soi- Clearly, for any 
given M in '^soi, the trajectory x{xi,u) belongs to ^soi- Therefore, ^soi is a preim- 
age of c^soi with respect to x{x[,u) : '2^ — *■ ^j. From the continuity of x{x[,u) and 
Theorem|2]it follows that ^soi is open. We state this result as a separate theorem: 

Theorem 3. Assume all the conditions of Section\3\are met, then '^soi '■J open. 



5 Planning Algorithm and Resolution Completeness Conditions 

In this section we establish sufficient conditions under which resolution complete al- 
gorithms to the motion planning problem exist. A simple resolution complete algo- 
rithm is constructed. We also prove the convergence of sampling-based algorithms 
in the trajectory space. 



5.7 Controls via Concatenation of Primitives 

Let Z C ^ be a countable set of motion primitives, each defined over a closed 
and bounded time interval. Borrowing concepts from the theory of computation, U 
can be interpreted as an alphabet. If motion primitives are applied in succession, a 
control that represents their concatenation is obtained. For example, if (7i , a2 G .S, in 
which ai : [0, ti] — > U and 02'- [0, to] -^ U, are applied in succession, the resulting 
control, denoted by Gi G2 is 
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Allowing any finite number of concatenations, each resulting control can be ex- 
pressed as a string, which is a finite sequence of motion primitives in 1,. Considering 
this, the set of all controls that can be formed from motion primitives is the Kleene 
star of Z, which is denoted and defined as 

I,* = {(7i a? • • • CTi- I ^ > and each <7,- G X}. (20) 

Note that we do not allow infinite sequences of motion primitives to be applied. The 
definition of X* allows the empty string, which is assumed to be zero control. 

The following lemma establishes a useful property of X* for the purposes of 
computation. 

Lemma 6 (Rectangular enumeration argument). For any set, E, of motion prim- 
itives, the set, E*, of all strings is countable. 

Proof. Consider E*, which consists of all strings of length not greater than n and 
composed of any characters of the alphabet E\„. For example, Zj* = {ai}, U| = 
{ ai , ai , cJi C7i , (7i 02 , (72 CTi , a2 a2 } , and so on; see Fig. |4] Verify that each E* is finite, 
and E* = Ur=i ^n- Hence, E* is countable as a countable union of finite sets. D 





1 2 


3 


Oi 


1 

{<7i(T2, 


{aiCTiai} 


02 


{02] CT2ai, 

0202} 





Fig. 4 Even if X is count- 
ably infinite, Z* is count- 
ably infinite, as shown by 
rectangular enumeration 
argument. We show that 

•^* = Ujr=l -^n ' in which 

sets I,* correspond to re- 03 jo's} 

gions bounded by dashed 

lines and finite. 



To facilitate the development of resolution complete planning algorithms, it will 
be helpful to introduce a set of motion primitives that is straightforward to describe 
and utilize. Moreover, assuming that all motion primitives in the set E are encoded 
digitally, it follows from the lemma above that all strings in E* are computable. 

Suppose that a system is defined as in (|3]l. First, replace U with a countable 
subset. If U is uncountably infinite, then choose a countable, dense subset U^ cU. 
For example, Ud could be the set of all m £ (/ for which all coordinates are rational 
numbers. If U is already countable, then we may simply let Ui = U . 

Let Eii C "^ be called the discrete-time model, and be defined as the set of all 
constant functions, m : [0 , f] -^ Ua, in which t — 1/2' for all / G N. Thus, the dura- 
tion of motion primitives can be 1/2, 1/4, 1/8, and so on. Any sequence of time 
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intervals that converges to zero may alternatively be used. The set of all strings 
that utilizes the alphabet E^t and the concatenation rule ( IT9l l consists of piecewise 
constant functions. We denote this set of strings as E^^. 

The discrete-time model is just one of numerous possible sets of motion primi- 
tives. However, the particular choice of E depends heavily on the considered system, 
the intended application, and the efficiency of the planning algorithm. Virtually any 
definition of E is allowed, provided that E* is dense in ^. We will show that this 
requirement is sufficient for a resolution complete algorithm to exist. 



5.2 Planning Algorithm 

Based on the background results of Section |4l we are now ready to establish the 
existence of resolution complete algorithms in a very general setting. The existence 
is demonstrated by construction of a simple string enumeration algorithm; refer to 
Algorithm[T] 



Algorithm 1. String enumeration algorithm 

Input : a set of motion primitives Z, the initial state xi, a goal set Xq, a collision detection 

module, an integration module. 

Output : a solution control a d Z*. 

1: n^O 

2: loop 

3: £i^l/2" ;n^/7 + l 

4: d ^- select the nth string from Z* 

5: -i{xi, d) <— integrate the trajectory, starting from xi under the control a, with tolerance 

6: Ci ^- (true or false) determine whether the error cone around x{xi,a) is contained in 

^free 

7: C2 ^- (true or false) determine whether the error cone of x{xi,a) terminates in Xq 

8: IfCi ACithen 

9: return a 

10: end if 

1 1 : end loop 



The selection operation 4 is required to be systematic, which means that strings 
are selected so that all of E* is enumerated as the number of iterations tends to 
infinity. This is always possible because E* is countable. 

For the integration line 5, validation line 6, and termination line 7, we would ide- 
ally like to have them executed in constant time with perfect precision. In practice, 
however, this is usually not possible. Most often, a numerical integration is neces- 
sary, which causes errors to propagate to the remaining two operations. Due to these 
limitations, an alternative will be defined: 1) An exact computation model, and 2) a 
numerical computation model. 
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Let the exact computation model refer to the case in which all operations are 
performed exactly in finite time without errors. This model is the simplest to ana- 
lyze; however, it is also less realistic. Note that for the exact computational model 
the parameter e\ is ignored in lines 5, 6, and 7 of the algorithm. 

To account for errors arising from a numerical integration error, let the numerical 
computation model be defined as shown in Fig. |5] Assume that the precision of the 
numerical integration algorithm can be tuned using a parameter £; £ {Q,°°), and the 
error of the numerical trajectory is bounded by 



\\x{xQ,u,t)-x{xa,u,t)\\ <eiF{t) 



(21) 



in which F is a strictly positive, monotonic function of time, and x{xq,u) is the 
numerical trajectory. 



Fig. 5 The numerical com- 
putation model. The dotted 
region is the error cone con- 
structed around the numer- 
ically integrated trajectory 
(solid line), which contains 
the exact trajectory (dashed 
line). 




The proposed numerical error model allows us to construct an error cone (around 
the numerical trajectory), which contains the exact trajectory; see Fig.|5l The error 
cone is then used for validation and termination purposes. It will be assumed that 
validation and termination algorithms are conservative, that is, the algorithm must 
reject a candidate solution if there is any possibility that the trajectory leaves Xfree 
or fails to terminate in Xq. Moreover, it may also be possible to reject strings under 
the numerical computation model by determining that all possible trajectories either 
leave Zfree or fail to terminate in Xq. However, this will not be considered in detail 
in this paper. 

5.3 Sufficient Conditions for Resolution Completeness 

We derive a set of sufficient conditions which guarantee that the proposed algorithm 
is resolution complete. The result is demonstrated for both exact and numerical 
computational models. Furthermore, we demonstrate that the discrete-time model 
satisfies these sufficient conditions. 



Theorem 4 (Sufficient conditions for the exact computational model). If I,* is 

dense in ^ , then Algorithm\I\is resolution complete under the exact computational 
model. 
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Proof. We have already shown that the set '^soi C '^ is open, and in case a 
solution exists, ^soi is nonempty. Since U* is dense, its intersection with '^soi is also 
nonempty. Hence, the algorithm, selecting systematically strings from E* , eventu- 
ally finds an element that is also in ^soi- Under the exact computational model, the 
selected string is accepted. D 

Theorem 5 (Sufficient conditions for the numerical computational model). If 

1,* is dense in ^, then Algorithm Q] is resolution complete under the numerical 
computational model. 

Proof. Let m £ '^soi be a solution to the motion planning problem and £ be a positive 
constant. Furthermore, let 5 — e/(2max [L„exp(LvT(M)),M/a] ) and A? G N be such 
that 1/2^^ < e/(4f (t(m) + 5/a)). Find a G I,*, such that its number is greater 
than N and p<g/ (m, a) < 5. It is always possible to do so because 1,* is assumed to 
be dense in 'W . With such choice of a, it follows that |t(m) — T(d')| < d / a, and, 
following the proof of Theorem[T] the distance pf£{x{xi,u),x{xY,a)) < e/2. 

Consider the distance between the exact trajectory, x{xi, S), and the numerically 
integrated trajectory, x{xi, a): 

p.9:{x{xi,u),x{xi,a)) < px{x{xi,u),x{xi,a)) + px{x{xi,a),x{xi,a)) 
e , ,^ss e eF(T(G)) e e 3e 

Furthermore, note that the error cone around x{xi,a) stays in e/4-neighborhood of 
the numerical trajectory. Hence, the error cone is contained in the e-neighborhood 
of x(xi, m). By adjusting e, which is arbitrary, we can ensure that the error cone is in 
.i^Tsoj. Therefore, a is accepted by Algorithm[T] D 

Now that we have a simple test to determine whether Algorithm [T] is resolution 
complete, we apply it to the discrete-time model. 

Theorem 6 (Denseness theorem). The set E^^ is dense in ^ . 

Proof. Intuitively, the proof follows from the well known fact that piecewise con- 
stant functions are dense in the space of measurable functions ll20ll . Unfortunately, 
the discrete-time model is only a proper subset of the set of all piecewise constant 
functions. Here we outline our proof that overcomes this difficulty. Refer to Fig. [6] 
for details. 

Consider any m in '^ and e greater than zero. Assume that U is partitioned with a 
collection of measurable sets {[/,} with nonempty interior such that the diameter of 
each set is less than EJ{7.x(u)). Let A,- be a preimage of [/,- with respect to u. Since 
ii is a measurable function and [/, is a measurable set, A,- is measurable. 

Next, consider the approximation of A, s from within by intervals of length 1/2^; 
denote these intervals as Ij = [(y — l)/2^,7/2^]. Assume the tolerance of the ap- 
proximation is less than e/ (2a), collectively for all A,s. This means that the measure 
of the difference between A, and all intervals which are subsets of A,- does not exceed 
£/(2a), collectively for all sets A,-. 
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Fig. 6 Collection {(/,} 
is a partition of U . Sets 
A,s are preimages of t/[S 
under the map u. Intervals 
/jS approximate A,s from 
within. The function u' 
is a piecewise constant 
approximation of u, defined 
on the collection {Ij}- 
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Finally, define m,- to be any element of Ua that is also in [/, (it is possible to find 
such Uj because t/,- has nonempty interior and U^ is dense in U), and let mq to be any 
in t/d- Construct the approximation 



fi'(f): 



M/ if 3 / and 3 j such that t £ Ij C_ A,- 
Mo otherwise 



By construction, u' is in E^^. Now, compute the distance 



1: = £ 



(23) 



(24) 



Here, X' denotes the summation over j such that Ij C A,- for some ;. 

In conclusion, we note that for any fi G ^ and e > we found u' G E^^ such that 
pc^^ (m, u') < e. Therefore, E^^ is dense in '^. D 

Finally, we have established the main result of the paper: 

Theorem 7. Under exact and numerical computation models, there exists a set of 
motion primitives and a resolution complete planning algorithm. 



Proof. The result follows from Theorems |4l|5] and|6l 



D 



Perhaps surprisingly, the algorithm may be resolution complete without actually 
causing convergence in dispersion. As discussed in Section |2l a dense sample se- 
quence on a precompact set converges. However, "^ is generally not precompact. 
Therefore, our proposed algorithm does not converge in '^ . Nevertheless, it con- 
verges in the trajectory space for the bounded time problem, assuming Xfiee is 
bounded: 

Theorem 8 (Convergence of Finite Length Trajectories). Assuming that execu- 
tion time is bounded by T, the space ^.7- = {x G ^ | x{x) < T} is precompact. 
Hence, a dense sequence of controls corresponds to a dense sequence of trajectories 
that converges to the set of all feasible trajectories. 



Proof. For the set S^ij, the following two conditions hold: 
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1. Uniform boundedness: There exists Ci > such that ||.?(?)|| < Ci for all x in 
^'ij. The condition is easy to verify. Let C\ = MT + ||.v:i|| and consider 

||x(r)|| < \\x{t) -m\\ + 11-^(0) II < Mr + IIa-iII < d ; (25) 

2. Uniform equicontinuity: There exists C2 > such that for all i G J^ 7 and any t 
and/' in [0, T(i)], we have ||x(/) — .T(f')|| < C2\t — t'\. The above follows directly 
from the definition of ^, by letting C2 = M. 

Using the Arzela-Ascoly theorem li20l . it follows that ^ 7- is precompact. The 
convergence of sampled trajectories follows from Lemma[T] D 

6 Conclusions 

In summary, we have considered the problem of existence of resolution complete 
algorithms for motion planning with differential constraints. To develop the set of 
sufficient conditions, the most important step was to map the space of controls "^ 
to the trajectory space J^ through the system dynamics. By forming metric spaces 
we have induced topologies on ^ and ,9^ and shown that the map is continuous, as 
illustrated in the diagram, which updates Fig. [T] with the precise terminology from 
the paper: 

I, U X (26) 



x(xi,u) 

1* ^ '^ ^ .T. 

We have relied on the fact that ^ can be sufficiently sampled using primitives X 
to obtain 1,* , yielding a computational approach. Using this formulation, we have 
introduced a resolution-complete algorithm for motion planning with differential 
constraints in the most general setting known to date, requiring only Lipschitz con- 
tinuity of the system. 

We believe that metric space formulations and the resulting functional analysis 
framework may be practically useful for a broader class of systems and motion 
planning algorithms. For example, consider a system which is symmetric under Lie 
group transformations (e.g., ifTOl ) and suppose the dispersion of a given set of mo- 
tion primitives is limited by e > in the set of all trajectories that have the final 
time bounded by Z\/. Consider further the set of all strings of length A^ composed of 
these primitives and the set of all trajectories with the final time bounded by NAt. 
It follows from Lemma |5] that the dispersion of the former set in the later set is 
bounded by Ke, in which K depends only on A^, At, and L^, however, it is indepen- 
dent of any particular trajectory (we leave it to the reader to verify this fact). Hence, 
to reach the desired dispersion in the reachable set, it suffices to build primitives that 
approximate short-time trajectories with a given resolution. We also find our study 



In the provided reference the theorem is called Arzela's theorem, and the synonymous 
term "relatively compact" is used instead of "precompact". 
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to be complementary to path pruning techniques (e.g., fO'.TTl). Path pruning meth- 
ods improve the path diversity to reduce the branching factor of search methods, 
whereas our analysis provides a bound on the dispersion of the set of pruned paths 
to guarantee convergence. 
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Grasp Invariance 

Alberto Rodriguez and Matthew T. Mason 



Abstract. This paper introduces a principle to guide the design of finger form: in- 
variance of contact geometry over some continuum of varying shape and/or pose of 
the grasped object in the plane. Specific applications of this principle include scale- 
invariant and pose-invariant grasps. Under specific conditions, the principle gives 
rise to spiral shaped fingers, including logarithmic spirals and straight lines as spe- 
cial cases. The paper presents a general technique to solve for finger form, given a 
continuum of shape or pose variation and a property to be held invariant. We apply 
the technique to derive scale-invariant and pose-invariant grasps for disks, and we 
also explore the principle's application to many common devices from jar wrenches 
to rock-climbing cams. 



1 Introduction 

What principles should guide the design of finger form? It depends on context — the 
specific application, the hand design philosophy, and in particular on the function 
assigned to the fingers. In this paper we explore the possible role of the fingers in 
adapting to variations in object shape and pose. One common design approach is 
to adapt to object shape and pose by control of several actuators per finger. But for 
simpler hands, with one actuator per finger, or even one actuator driving several 
fingers, the job of gracefully adapting to shape and pose variations may fall on the 
finger form. This work explores grasp invariance over shape and/or pose variation 
as a principle for finger form design. 
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We begin with a geometry problem: 

Scale-invariant contact problem: Given O an object shape in the plane, p a point on 
the boundary of O, and c the location of the finger's revolute joint, find the finger shape 
that makes contact at p despite scaling of O. 




•?' _ 



Fig. 1 The scale-invariant contact problem (left), solved in Sect. 13.21 yields a finger whose 
contact with the object O is preserved for different sizes of O (right) 



If (9 is a smooth shape, a solution to the previous problem would preserve contact 
normal as well as contact location. As a consequence, many properties governing 
the mechanics of grasping and manipulation would also be preserved. For example, 
as we shall see later in this paper, if (9 is a disk of varying scale in the palm of a two- 
fingered hand, the solution of the scale-invariant contact problem yields identical 
equilibrium grasps despite the variation in scale. 

This paper generalizes the scale-invariant contact problem, and explores its impli- 
cations for finger design for grasping and fixturing problems. Scale-invariant contact 
is just one example of a broader class of problems where a continuum of constraints 
guides the design of finger shape. In the specific case of scale invariance, the set of 
constraints is generated by scaling the object. 

In Sect. [3] we show how, under certain conditions, the problem can be mathemat- 
ically formulated and admits a unique solution as an integral curve of a vector field. 
In Sect, m we find the analytical expression of the integral curve for the most simple 
and common cases. The integration yields spiral shaped fingers — in special cases, a 
logarithmic spiral, a shape long noted for its scale invariant properties fill. Finally, 
in Sects. |5] and |6l we show that the principle applies to fixture and finger designs 
seen in many common devices from jar wrenches to rock-climbing cams. 



2 Related Work on Finger Design 

This paper grew out of our interest in simple hands, focused on enveloping grasps 
of objects with uncertain pose and shape |!9|. The actual forms of the phalanges and 
the palm become important when the locations of contacts are not carefully planned 
and controlled. Simple hands adapt to varying shapes and poses by the emergent 
interaction of hand with object, rather than by actively driving the hand shape to 
conform to a known object shape and pose. 
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Several other approaches are evident in previous hand design Hterature. The 
enveloping grasps we employ are often contrasted with fingertip grasps, where the 
details of phalange form are not usually considered. The dichotomy between en- 
veloping grasps and fingertip grasps corresponds roughly to the distinction between 
power and precision grasps in Cutkosky and Wright's grasp taxonomy Q. 

While much of the analysis of grasp has focused on local stability and in-hand 
manipulation with fingertip grasps f3j, hand design research has explored various 
grasp types more broadly. The hand design leading to the Barrett Hand lfT4l [H ex- 
plored a single finger probe, a pinch grasp, both cylindrical and spherical enveloping 
grasps, a two-fingered fingertip grasp, and a hook grasp. The design of the DLR-II 
likewise was guided by the desire to produce both power and precision grasps ID. 

Unlike the work cited so far, our approach does not change the shape of the 
fingers to adapt to an object pose or shape. Instead, adaptation emerges from the 
interaction of fixed finger shapes with the object. In that respect the closest work is 
Trinkle and Paul's lfT3l work on enveloping grasps. Dollar and Howe's iQ work on 
hands with compliantly coupled joints, and Theobald et al.'s fW\ design of a gripper 
for acquiring rocks. None of the above focus on the actual finger form. Dollar and 
Howe |I6| survey 20 different designs of compliant and underactuated hands, and 
all employ cylindrical or straight fingers, occasionally employing some additional 
shape features, but with no particular principle described to guide their design. 

The closest work to this paper is found not in robotics research, but in the design 
of various tools and hardware. All the devices in Fig. |2] adapt to shape and pose 





S'*^ e) 



Fig. 2 Curved shapes used in: a) Truck door lock; b) Jar opener; c) Anti-kickback device for 
table saw; d) Pliers; e) Climbing cam 



variations without changing their shape. Instead, the adaptation arises from the 
curved shape. In the cases of the rock-climbing cam [8J and the anti-kickback device 
||2ll the shape was derived theoretically. The others, to our knowledge, are product 
of human intuition. In Sec. |6] we analyze their shape under the grasp invariance 
principle. 
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3 Problem Formulation 

Let H{x,s) be a parametrized transformation of O, i.e. for all values of the trans- 
formation parameter s, H{0,s) is a warped version of the object. We are interested 
in designing a finger that makes first contact with O at a given point p, and with 
points ps — H{p,s) for all warped versions of O. Locally, making first contact is 
equivalent to the object and finger being tangent. Under this formulation, the scale- 
invariant contact problem becomes: 

Transformation-invariant contact problem: Find the shape of the finger that is always 
tangent to the set of objects {H{0,s)}^ when contacting them at points {ps}^- 

Let contact curve I describe the travel of the point p, and let contact vector v be the 
tangent to the object at point p, Fig. [3] 

Contact curve / ■. 



Contact vector 



H{0,S3)\ 



Fig. 3 The motion of the 
object O defines the travel 
of the point p and with 
it, the contact curve / and 
the motion of the contact 
vector V 



Every point 4 in the contact curve corresponds to an instance of the object, 
H{0,s). As a consequence it induces a constraint at one point in the finger. This 
establishes a relationship between points in the contact curve and points in the fin- 
ger: Whenever the finger crosses the contact curve, i.e. contacts the object at ps, it 
must always do it tangent to the corresponding contact vector v.,, i.e. tangent to the 
object at ps- In the next section we see that this relation leads to a reformulation of 
the problem in terms of vector fields. 

3.1 Geometric Reformulation of the Problem 

The relationship between the finger and the contact curve described in the last sec- 
tion allows us to reformulate the transformation-invariant contact problem as: 

Transformation-invariant contact problem: Find the finger form that always crosses 
the contact curve / tangent to the contact vector v. 

Every time the finger crosses the contact curve, its tangent must satisfy one con- 
straint. Let r be the distance between the center of rotation, c, and that contact point. 
As shown in Fig.|4^, while the finger rotates around c, the same tangential constraint 
is propagated along an arc of radius r and center c. By repeating the same procedure 
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a) 



b) 



Fig. 4 a) Parallel transport of v radially from c. b) Extending the same construction to all 
values of s, gives us the vector field 



for all constraints, or equivalently for all points in the contact curve, we obtain a 
vector field V defined on an annulus in the plane. 

The vector field V is well defined if and only if the construction does not impose 
inconsistent constraints, that is, if and only if it does not impose two different con- 
straints at the same point. To avoid inconsistent constraints, the distance between 
c and the contact curve must be strictly monotonic, because different points along 
the contact curve will impose constraints on different concentric arcs. In practice, 
given a parametrized transformation H{0, s), we avoid inconsistency of constraints 
by restricting to an interval of s where such distance is strictly monotonic. We will 
refer to a problem that avoids inconsistent constraints as a proper problem. 

By construction, any integral curve of the vector field y of a proper problem must 
satisfy all contact curve constraints. Hence, if we shape the finger following the 
integral curve, it shall contact the object with the expected geometry, and therefore 
must be a solution to the transformation-invariant contact problem. 

By virtue of the the theorem of existence and uniqueness of maximal integral 
curves. Theorem [TJ the existence of the integral curve depends on the smoothness 
of the vector field V. If the contact curve / is smooth and v changes smoothly along 
it, the vector field V will also be smooth because V is the parallel transport of v along 
concentric arcs centered at c, which is a continuous and differentiable operation. If 
that is the case, the solution is guaranteed to exist for the restricted interval of s 
where the problem is a proper problem. 

Theorem 1 (Existence and Uniqueness of Maximal Integral Curves). Let X be 

a smooth vector field on an open set U 6 R"+^ and let p ^ U. Then there exists a 
unique and maximal integral curve (x{t) ofX such that a(0) = p. 

The theorem is a direct consequence of the fundamental existence and uniqueness 
theorem for solutions of systems of differential equations lfT2l . 
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3.2 General Solution Recipe 

The derivation of the previous section suggests a general procedure for obtaining 
the shape of the finger: 

1. Given the parametrized transformation of the object, H{0,s), construct the con- 
tact curve / and the set of contact vector constraints v. 

2. Obtain the vector field V, by rotating / and v around the rotation center. 

3. Find the shape of the finger by integrating V. 

Figure |5] shows an example of the procedure applied to the scale-invariant contact 
problem for a disk-shaped object resting in a planar palm. When scaling the disk, 
the contact curve / becomes a line and the contact vector v is constant along I. 



Stepi 






Fig. 5 Step by step execution of the general recipe to obtain the form of the finger for the scale- 
invariant contact problem. Step 1 Construction of the contact curve / and contact vectors v. 
Step 2 Vector field V. Step 3 Integral curve of V 



4 Analytic Solution 

In the general case we can find the shape of the finger using the recipe in Sect. 13.21 
and numerically integrating the vector field. In some specific cases, namely when 
the contact curve / is a line and the contact vector v is constant along it, we can also 
integrate analytically the vector field. This is the case of linear scaling of the object, 
translation of the object, or any linear combination of scaling and translation. 

Let {x,y) and {r, 9) be the cartesian and polar coordinates in the plane. The shape 
of the finger is the solution to the system of first order differential equations: 

y = Vy ^ ^ 

where V = {Vx,Vy) is the vector field obtained in Sect. 13.21 The identity (O re- 
lates the derivatives of the cartesian coordinates to the derivatives of the polar 

coordinates: 

dy _ r'(0) sine + r(0) cos 



dx r'(0)cos0-r(0)sin0 
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With it, we can rewrite the cartesian system ([T]) as the single polar differential equa- 
tion (O that we will solve in the next subsections. 



Vy _ /(0)sin0 + /-(0)cos0 
V^^ r'(0) COS 0-r(0) sine 



(3) 



Without loss of generality we suppose the contact curve / to be parallel to the Y 
axis and the center of rotation c to lie on the X axis. Let Oo be the constant angle 
between / and the contact vector v. Depending on the relative location of c and / we 
distinguish three cases: (I) c lies on /; (II) c is at finite distance from /; and (III) c is 
at infinity, i.e. the finger translates rather than rotates. 

4.1 Case I: Rotation Center on the Contact Curve 

We assume / to be the Y axis and c to be located at the origin as in Fig.|6l 



Fig. 6 Parallel transport 
of vo along the arc with 
center at c gives us Vij^yu 
the tangent vector to the 
finger at (x,y) 



Rotation 
center 

(0,0) 




vo .On 



\ 


Parallel 




transport 


1 


vo 


t / 

; / 





In this case, the differential equation (O becomes: 



— = tan (^Oo + + - 



Solving for /(e): 

and integrating, we obtain: 



/(O) 



_ /-'(0)sin0 + r(0)cos0 
~ r'(0) cos 0-r(0) sine 

-r(0)tano!b 



r(0)=Ce-'^'™«« 



(4) 



(5) 



(6) 



where C is the integration constant. Solution ^ is the equation of a logarithmic spi- 
ral with pitch 5 + tan^ ' (cot Oq) . The solution could have been anticipated from the 
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diagram on Fig.|6]because the angle between the radial line and the vector tangent 
to the curve is constant. This is characteristic of logarithmic spirals and gives them 
their scale invariant properties ifTTl . as we will further see in Sect.|6l 

4.2 Case 11: Rotation Center at Finite Distance from Contact 
Curve 

Without loss of generality, let c be the origin and let / be parallel to the Y axis, 
crossing the X axis at (1,0), as in Fig.|7l 



Contact line 



Fig. 7 Same construction as 
in Fig.|6] Parallel transport 
of vq along the arc with 
center at c gives Vi^ y), the 
tangent vector to the finger 
at {x,y) 




I / Parallel 
(1.0) 1/ transport 
f vo 



If we apply Q to the construction of Fig. [7] we obtain: 



K 



- =tan(ab + + /3)=tan 



Oq + + cos 



r'(0)sin0 + /-(0)cos0 
r'(0) cos 0-r(0) sine 



With some trigonometric algebra, (|7]l can be solved for /{O) as: 



(7) 



r'(0) = r(0) 



cos Oo — \/r{9y — Isin Oq 
sin Oq + \/r{9)~ - Icos Oq 



(8) 



Equation ([8ll is a separable differential equation with form 4i — g{r) and can be 
solved as fdO= f -j-^dr : 



9 = 



/--/ 



1 sin Oo + y/r{9y - 1 cos Oq 



ri9) cos Ob - ^/r{9)^- Isin Oq 



dr = 



change 



f^vM^F^ 
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f (sinob + fcosoo) 



•^ + 1 (cos Oo — r sin Oo 

1 COSOo 



-dr = 
-dr = 



t^ + l sin Ob? — cos Oo 
= -tan-' (0 - l"(|sma,)/-cosao|) ^ ^ ^ 
tanoo 



tan 



, . ln( jsinOoVr^— 1 — cosool , 

(Vr'--l) ^ ^+C (9) 

V J tanoh 



where C is the integration constant determined by the initial conditions. As shown 
in Fig.m the solution is a family of spirals valid for all Ob, except when tan Oo = 0. 
In that specific case, a similar derivation yields a different spiral: 

0(r) = -tan-' (\/r^-l)+\/r--l + C (10) 

4.3 Case HI: Rotation Center at Infinity 

A center of rotation at infinity corresponds to the case where the finger does not 
rotate but translates in a particular direction v^, i.e. the finger joint is prismatic rather 
than revolute. The vector field V is defined, then, by the translation of the contact 
line in the same direction. As we are limiting the analysis to the case where the 
contact vector is constant along the contact line, the vector field V is constant on 
the plane. The integral curve of V, and shape of the finger, is consequently a line 
segment aligned with the constant contact vector vq. 

The vector field is well defined except in the case when the translation of the 
finger v^ is aligned with the contact line. In this degenerate situation, the vector field 
is only defined on top of the contact line, and the solution is still a line. 



4.4 Family of Solutions 

In cases I and II, the integral curve of the vector field is a spiral. In case III, we 
always obtain a line segment. Figure [8] shows a summary of the solutions we obtain 
when changing Oq. Note that the vector fields obtained with Oq and Oq + ;r always 
have the same magnitude and opposite direction, hence it suffices to analyze the 
range [f , — f ] ■ In Fig.[8]we show a long section of the spiral. However, for practical 
reasons, the actual shape of the finger is just the initial portion. 
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Fig. 8 Plots show the contact curve (vertical bold line), the rotation center (grey dot), and 
the finger solution (grey curves) for different values of Oq. In all cases, the finger and the 
contact curve cross at the lower half of the contact curve with constant angle, equal to the 
corresponding Oq. Case I Rotation center lying on the contact curve. Case II Rotation center 
at finite distance from the contact curve. Case III Rotation center at infinity 
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5 Applications to Grasping 



5.7 Scale Invariant Grasping 

The solution to the scale-invariant contact problem in Fig.[T]shows a finger that con- 
tacts a disk of varying size with constant geometry. This same idea can be used to 
design a gripper whose equilibrium grasps are geometrically invariant with scale. 
Suppose we aim for a triangular grasp between two fingers and a palm. Fig.|9]shows 
the corresponding contact line /, contact vectors v, and the induced normalized dia- 
gram. The solution belongs to the family of spirals obtained in Case II in Sect. 14.21 
where the center of rotation c is at finite distance from the contact line /. 
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Fig. 9 a) Contact curve, /, and contact vector, v, of the scale invariant grasping application. 
b) Corresponding normalized diagram with Oq = j5 



The contact line, the contact vector, and the form of the finger, depend on the ob- 
ject O and the type of contact we aim for. Therefore, the form of the finger depends 
on the task to solve. For the specific example of task on Fig. [9l we can construct 
a scale-invariant gripper by combining two identical but symmetric fingers as in 
Fig.[10l 




Fig. 10 Gripper with scale invariant fingers for grasping spheres with a regular triangular 
grasp 
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5.2 Pose Invariant Grasping 



Here we aim to design a hand whose grasps are invariant with respect to the location 
of a given object rather than scale. Suppose again that we want to grasp a disk of 
a given size with a triangular grasp, against a planar palm. Now, the disk can be 
located anywhere along the palm, and we want the grasp geometry to be invariant 
with respect to that displacement. Figure [TT| shows the corresponding contact line 
/, contact vectors v, and normalized diagram. The solution belongs to the family of 
spirals in Case II in Sect. 14.21 
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b) 



Fig. 11 a) Contact curve, /, and contact vector, v, of the pose invariant grasping application. 
b) Corresponding normalized diagram with Oo = J3 



Again, the contact line, the contact vector and consequently the form of the finger 
depend on the object O and the type of contact desired. The diagram corresponds to 
case II in Sect. 14. 21 with the rotation center at a finite distance from the contact line. 
In Fig. [12] two identical but symmetric fingers compose a pose-invariant gripper. 




Fig. 12 Invariant grasping geometry for different locations of a sphere 
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5.3 Pick-Up Tool 

Suppose we are to design a gripper with two rigid fingers to pick up an object from 
the ground. The object needs to slide along the length of the fingers while it is being 
lifted, similar to Trinlde and Paul's work on dexterous manipulation with sliding 
contacts lfT3l . Because of the critical role that contact geometry plays in the sliding 
motion, complex lift plans can be simplified if the contact geometry between finger 
and object were to be invariant with respect to the lifting motion. With that in mind, 
we can use the grasp invariance principle to find the finger shape that preserves a 
contact suitable for sliding. Figure[l3] shows a gripper designed to pick up disks. 




Fig. 13 Invariant contact geometry of the pick-up tool when lifting an object from the ground 

6 Applications to Common Devices 
6.1 Rock-Climbing Cam 

A spring loaded cam is a safety device used in rock climbing to secure anchor points 
in cracks in the rock face. The device uses the mechanical advantage of the wedge 
effect to convert pulling force into huge friction forces. Fig. fT4l 





Fig. 14 a) Spring loaded cam as used in rock-climbing, b) Static force diagram 



The diagram on Fig.[T4b reveals an important relationship between the coefficient 
of friction jx and the cam angle /3 . In static equilibrium (zero torque at c): 

r- Ff cos P — r- F^sinli 

[Ff<^FN] FNtiinj3<nFN 

j3<tan"V (11) 
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where r is the distance from the cam rotation center c to the contact point, Ff^ is the 
normal force caused as a reaction to the pulling force Fp, and Ff is the frictional 
force. To design a cam whose loading pattern is invariant with the size of the crack, 
d, we can integrate the constraint (fTTI l so that jS remains constant despite variations 
in d. The rotation center c lies on top of the contact line / and hence the solution is a 
logarithmic spiral with Oq = jS as shown in Fig. [15] example of Case I in Sect. 14. II 







Fig. 15 Invariant loading pattern for different wall openings when using a logarithmic spiral 
shape 



The logarithmic spiral has been used for decades in the design of climbing cams. 
The invention of modern rock-climbing cams is attributed to Raymond Jardine lO. 
His invention used a logarithmic spiral, with camming angle /3 = 13.5°. 

6.2 Anti Kickback Device for Table Saws 

Table saw kickback happens when the blade catches the workpiece and violently 
throws it back to the front of the saw, towards the operator. An anti kickback device 
is a passive device that only allows forward motion of the workpiece. Among several 
options to introduce the asymmetry, one option is to use a spiral like rotating part 
that wedges a workpiece trying to move backwards, Fig.[T6l 




Fig. 16 Anti kickback device as used in a wood table saw. The contact between the retaining 
part and the workpiece is at an optimum angle /3 



Experiments have determined that the optimal contact between the workpiece and 
the anti kickback device is reached when /3 = 8° Q. To make the contact invariant 
with the thickness of the workpiece, we can use the grasp invariance principle to 
design the shape of the device. The center of rotation lies on top of the contact line / 
and consequently the optimal solution is a logarithmic spiral, with Oo = /3 + f . The 
patent of the device |i2J proposed a logarithmic spiral as the optimal contour. 
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6.3 Jar Wrench 

The jar wrench in Fig.[2j3 can open jars of varying sizes. Figure [17] shows the basic 
operating principle. The lid contacts the inner disk and the outer contour at differ- 
ent places for different sized lids. The mechanics of the opening procedure, and 
specifically the amount of friction available, depend of the value of the angle /3 . 




a) \:r__-^ b) 

Fig. 17 a) Jar wrench, b) Simplified diagram where the discs are fixed and the wrench rotates 

If we know the optimal value of /3, and want the mechanics to be invariant across 
the range of jar lid sizes, we can use the grasp invariance principle to design the 
contour. To simplify the analysis we suppose, as shown in Fig.FTTb. that the lids are 
fixed and resting always against the same contact point, while the wrench rotates to 
contact the lid. The center of rotation is on top of the contact line /. Consequently 
the optimal shape of the jar wrench is a logarithmic spiral, with Oo = ^ — /3 . 

7 Discussion 

In this paper we have introduced the grasp invariance principle and have given a 
recipe to apply it to design the form of rotating fingers and fixtures, given a contin- 
uum of shape or pose variation and a property to be held invariant. We have shown 
that, in simple cases, the finger has a spiral form, and have applied the technique 
to design scale-invariant and pose-invariant grippers for disks and a pick-up tool. 
Finally we have analyzed the shape of few common devices under the principle. 

There are lots of extensions, generalizations and applications that we lacked 
either the space to discuss or the time to explore. Here we mention a few of them: 

• Open design questions: Where should we put the center of rotation c? How do 
we choose the transformation H{x,s) function? The finger shape depends both 
on c and H{x, s) . Those parameters can be chosen to satisfy additional properties. 

• This paper covers design issues of rotating and prismatic fingers. There are other 
types of finger motion worth considering, for example those involving linkages. 

• 2D designs are readily adapted to design grasp invariant 3D grippers, for example 
by arranging three fingers symmetrically around a circular palm. However we can 
also think about a deeper 3D generalization where fingers become 2D surfaces. 

• What happens when we deal with non-smooth boundaries? Can we extend the 
approach to include contact at vertices? 
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• The grasp invariance principle applies to one-dimensional variations of shape or 
pose of the object, e.g. scale-invariant or pose-invariant grippers. How can we 
trade off various objectives to address generalization across objects and tasks? 
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Path Planning on Manifolds Using Randomized 
Higher-Dimensional Continuation 



Josep M. Porta and Leonard Jaillet 



Abstract. Despite the significant advances in path planning methods, problems in- 
volving highly constrained spaces are still challenging. In particular, in many sit- 
uations the configuration space is a non-parametrizable variety implicitly defined 
by constraints, which complicates the successful generalization of sampling-based 
path planners. In this paper, we present a new path planning algorithm specially 
tailored for highly constrained systems. It builds on recently developed tools for 
Higher-dimensional Continuation, which provide numerical procedures to describe 
an implicitly defined variety using a set of local charts. We propose to extend these 
methods to obtain an efficient path planner on varieties, handling highly constrained 
problems. The advantage of this planner comes from that it directly operates into 
the configuration space and not into the higher-dimensional ambient space, as most 
of the existing methods do. 



1 Introduction 

Many problems require to determine a path between two points, fulfilling a given set 
of constraints. In Robotics, this appears for instance in parallel manipulators ||35]| . 
robot grasping 1251 . constraint-based object positioning |24|, surgery robots [1], and 
humanoid robots f2U\. This situation also appears in Biochemistry when searching 
for low energy paths between different molecular conformations lf38l . In all these 
cases, the constraints expressed as a set of equations reduce the configuration space 
to a variety composed by one or more manifolds embedded in a higher-dimensional 
ambient space, defined by the variables involved in the equations. Approaches that 
try to directly describe these manifolds exist, but they are either too complex to be 
applied in practice |6|, or limited to particular architectures |28|. The adaptation of 
sampling-based planning methods is also cumbersome since, sampling in the ambi- 
ent space, the probability of the samples to lay on the configuration space is null. 
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Fig. 1 RRTs with 500 samples. Blue crosses represent the tree nodes and red lines the con- 
nections between them, a When the ambient space is a box tightly enveloping the sphere, the 
exploration is relatively homogeneous, b When the box is elongated along the vertical axis, 
an unwanted bias penalize the exploration. 



Consequently, several methods have been devised to find points of the configuration 
space from points of the ambient space. 

The Kinematics-based Roadmap method fT2ll samples a subset of variables and 
uses inverse kinematics to find all the possible values for the remaining ones. This 
strategy is only valid for particular families of mechanisms, and although some im- 
provements have been proposed [8 1, the probability of generating invalid samples is 
significant. Moreover, the presence of singularities in the subset of variables solved 
via inverse kinematics complicates the approach lITTl . 

An alternative strategy to get a valid configuration is to use numerical iterative 
techniques, either implementing random walks |39|, or the more efficient Jacobian 
pseudo inverse method 1 2 , 9 , 3 1 . All these approaches only perform properly when 
the ambient and the configuration spaces are similar. If the constraints define one 
or several complex surfaces with many folds, a uniform distribution of samples in 
the ambient space will not translate to a uniform distribution in the configuration 
space and this heavily reduces the efficiency of the sampling approaches. This prob- 
lem may appear even in simple cases such as the one described in Fig. [T] where a 
Rapidly-exploring Random Tree (RRT) is built on a sphere from points sampled in 
a 3D ambient space. If the sphere is not centered in, and tightly enveloped by the 
ambient space, the sampling process is biased and the result is a poor exploration of 
the solution variety. The lack of prior knowledge on the variety structure makes it 
hard to forecast whether or not a sampling-based approach would be successful. 

One way to limit the problems of mismatching between the two spaces is to focus 
the sampling on a subset of the ambient space around the configuration space B2ll . 
However, even in the case where the ambient and the configuration spaces are some- 
how similar, samples are thrown in the ambient space that can be of much higher 
dimensionality than the configuration space. Um et al I.36J sketch a RRT scheme 
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Fig. 2 Atlas of the sphere obtained by Higher-dimensional Continuation. Each polytope is a 
chart that locally parametrizes the sphere, a The full atlas includes about 500 charts, b The 
part of the atlas explored with our approach for connecting the two poles. Only about 30 
charts are generated. The solution path is shown as a yellow line. 



where the tree is defined in the tangent of the configuration space, which is of the 
same dimensionality as the variety. However, the overlap between tangent spaces at 
different points can lead to an inappropriate sampling bias and points in the tangent 
space do not actually fulfill the equations defining the variety. Ideally, one would 
like to sample directly on the configuration space. A uniform sampling over this 
space typically relies on a global parametrization. In some families of mechanism 
distance-based formulations provide this parametrization IIT31I3TI . some approaches 
try to infer it from large sets of samples 1 14|, and task-space planners assume that a 
subset of variables related with the end-effector are enough to parametrize the con- 
figuration space 1 40, 27 1. However, it is in general not possible to obtain a global 
isometric parametrization of the configuration space. 

From differential geometry, it is well known that a variety can be described by an 
atlas containing a collection of charts, each chart providing a local parametrization 
of the variety |22|. Higher-dimensional Continuation techniques (see 1 17| for a sur- 
vey) provide principled numerical tools to compute the atlas of one of the connected 
components of an implicitly defined variety, departing from a point and avoiding 
overlap between neighboring charts. For instance. Fig. [2^ shows the atlas obtained 
with the most recent of these techniques |15| in the toy problem of the sphere. 
One-dimensional continuation methods (also known as path following, homotopy 
or bootstrap methods), have been strongly developed in the context of Dynamical 
Systems |fT9ll . whereas in Robotics, they have been mainly used for solving problems 
related to Kinematics |[26ll29]| . To our knowledge, Higher-dimensional Continuation 
tools have not been used in Robotics. 

In this paper, we extend the tools developed for Higher-dimensional Continua- 
tion to the context of path planning. We define the concept of partial atlas connect- 
ing two configurations, dealing with the presence of obstacles. We also introduce 
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the random exploration of a variety focused towards a target configuration. As a 
result, we obtain a Higher-dimensional Continuation planner (HC-planner) for 
highly constrained systems that clearly outperforms existing approaches. Figure|2j) 
shows an example of path found with our approach for the sphere toy problem. Note 
how only a small subset of all the atlas charts is needed to find a path connecting the 
two query points. 

Next section provides a description of the tools for High-dimensional Continua- 
tion. Section[3]proposes an extension of these tools to the context of path planning. 
Section |4] compares the performance of the planner with respect to existing meth- 
ods for several benchmarks. Finally, Section |5] summarizes the contributions of this 
work and indicates points that deserve further attention. 



2 Higher-Dimensional Continuation 

Next, we describe the main algorithmic tools appearing in ITSll . By generalizing the 
one-dimensional pseudo-arclenght procedure, these tools allow the generation of an 
atlas for describing a ^-dimensional smooth variety implicitly defined by a system 
of equations 

F(x) = , (1) 

with F : M" ^ M"^*^, n> k>0. Figure[3]illustrates the main idea on which relies the 
approach. Given a point x,- on the variety, we can define O,, an orthonormal basis of 
the tangent space of the variety at this point. This is the nx k matrix satisfying 

with J(x,) the Jacobian of F evaluated at x, and I the identity matrix. The pair (x,, O,) 
defines a chart, C,-, that locally approximates the variety. To extend the atlas, the root 
point, Xj, of a new chart, Cj, can be obtained by first generating a point, x^, using 
the tangent space of C, 

x/ = x,+0,u/, (3) 

with u/ a A:-dimensional vector of parameters. Then x^ is the orthogonal projection 
of Xj into the variety. This projection is obtained by solving the system | 

F(x,-)=0, 
oT(x,-x,-)=0, 



using a Newton procedure where Xj is initialized to Xj and where at each iteration Xj 
is updated with the increment Ax, fulfilling 






Ax, = -(^T..; '. ^ )■ (5) 
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Fig. 3 Higher-dimensional Continuation method applied to a two-dimensional manifold em- 
bedded in a 3D ambient space, a A chart is defined from the tangent space at a given start 
point X,. The area of applicability of the chart is denoted as Vi- A point Xj defined using the 
tangent space at x, is orthogonally projected to the manifold to determine xy, the root point of 
the next chart, b The new chai't locally parametrizes a new region of the manifold. The area 
of applicability of the new chart is Vj, that does not overlap with "P,-. 



The update is applied until the norm of the right-hand side of the previous system 
becomes negligible or for a maximum number of iterations. When a valid x,- is 
determined, we can define a new chart Cj, as shown in Fig. [3J3. The intersection 
between tangent spaces marks the boundaries of applicability of the corresponding 
charts, denoted as V, and Vj, respectively. When C, is fully surrounded by other 
charts, Vt becomes a convex polytope. Note that Vt is defined in the tangent space 
associated with C,- and, thus, it is a polytope in a ^-dimensional space and not in the 
much larger n-dimensional ambient space. 

The algorithm proposed in [ 15 1 gives a systematic way to define new charts and 
to generate the associated polytopes. In this work, V, is initialized as an hypercube 
enclosing a ball, Bi, of radius r, as illustrated in Fig.|4^. The polytope is represented 
using a set of faces that intersect defining a set of vertices |7|. A vertex v of Vt 
external to Bi, can then be used to generate a new chart. From this vertex, a point x,- 
on the surface of Bi is defined using Eq. (O and 

u/ = Av. (6) 



If C, and the new Cj generated from u/ are too far or too different, i.e., if 



|oTo,||<l-a, 



(7) 
(8) 



the new chart is discarded and a new attempt of chart generation is performed from 
a set of parameters u/ closer to x,. When Cj is valid, it is used to refine Vi from the 
intersection between Bi and Cj, the projection into the tangent space of C, of the part 
of the variety covered by Cj. This projection is approximated by a ball, Bj, included 
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Fig. 4 Polytope-based chart construction, a The initial polytope V; is a box including a ball 
of radius r around x,-. b The polytope is refined using a ball Bj that approximates Cj, the 
projection of a neighboring chart into the current chart. 



in Cj, as shown in Fig.|4}5. For this approximation to be conservative, the radius of 
Bj is scaled by a factor a, < a < 1, depending on the angle between sub-spaces 
spanned by O, and O^. 

The hyperplane defined by the intersection of Bj and Bj can be computed by 
subtracting the equations for the two balls. As shown in Fig.HJ^, this plane defines 
a new face of Vi that eliminates some of its vertices (in particular the one used to 
generate Cj) and generates new ones. Vj, the polytope associated to Bj, is cropped 
using the projection of C, into Cj. 

When all the vertices of the polytope of a chart are inside the associated ball, 
the chart cannot be further expanded as the domain for this chart is fully bounded. 
This process of chart expansion continues as far as there are open charts. At the end, 
the connected component of the variety containing the initial point is fully covered 
by a set of chats whose area of validity is bounded by the corresponding polytopes 
(see Fig. [2^). To fully characterize the connected component of the variety, higher- 
dimensional continuation tools need to consider configuration space singularities, 
where the variety bifurcates fT6l. Here, we consider only the case were the Jacobian 
of F at singularities is of rank n — ^ — 1 . In this case, the singularities form a zero- 
measure set that can be located by monitoring an indicator function, X^s), whose 
value is different for two points on opposite sides of the singularity set fT|. When 
located, singular points are accurately determined using a dichotomic search and 
the additional vector of the null space of the Jacobian at the singular point is used 
to define a point on the other branch of the variety. This point is used as a root for a 
new chart from where to start the coverage of the additional branch of the variety. 

The cost of the algorithm at each step is dominated by the cost of two searches 
among the set of charts: one to find an open chart and another to find the potential 
neighbors of a new chart. The first search can be saved keeping the open charts in a 
list. The performance of the second search can be increased using a kd-tree storing 
the root points of the charts. 
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3 Path-Planning on Manifolds 

Using the tools described in the previous section, a graph can be built where nodes 
are the chart roots and edges represent the neighboring relations between charts. 
Thus, the shortest connecting two given points can be easily computed using a stan- 
dard graph search method such as A* considering only the collision-free transitions 
between the chart roots. This procedure defines an optimal, resolution complete path 
planner, but it is only practical for low dimensional varieties, specially if we have to 
use charts with small applicability areas. If we define charts with large radius, the 
presence of obstacles becomes an issue. If we use a coarse resolution in an environ- 
ment with many obstacles, most of the transitions between chart centers will be in 
collision and it will not be possible to find ways out among obstacles. 

Herein, we propose modifications to the Higher-order Continuation procedures 
to deal with the curse of dimensionality and the presence of obstacles. First, we 
take advantage of that path planning is only concerned with the path between two 
given configurations and not with the full atlas generation, which allows to save 
the construction of many unnecessary charts. Second, to deal with the presence of 
obstacles, we randomize the process of atlas extension and adapt the generation of 
charts to the presence of obstacles. 

3.1 Chart Selection: Focusing on the Path to the Goal 

As aforementioned, the atlas structure can be represented by a graph where nodes 
are the charts and edges are the neighboring relations between charts. To guide the 
search toward the goal, we use a Greedy Best-First search where the chart to expand 
is the one with minimum expected cost to reach the goal. The cost for a chart C, is 
heuristically evaluated as 

h{i)=l5"-\\x,-xg\\, (9) 

where x^, is the goal configuration, /3 > 1 a fixed parameter, and «, is the number of 
times a chart failed to expand. Thus, the term /3"' prevents the search to get stuck 
in dead ends. As soon as the goal is connected to the rest of the atlas, the search is 
stopped. 

Observe that using a Greedy Best-First search, we do not necessary generate all 
the neighbors of the chart under expansion. The generation of children charts pro- 
ceeds only while the children have higher cost than the parent. This largely reduces 
the generation of charts. 

Finally, note that due to the use of a Greedy Best-First algorithm the final path 
is not necessarily optimal. As mentioned, the generation of a (resolution) optimal 
path would require the use of an A* algorithm, the generation of all the neighbors 
of the node/chart under expansion, and the use of a small resolution to deal with the 
presence of obstacles. In general, this implies to generate too many charts, hindering 
the practical applicability of the approach. 
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3.2 Chart Expansion: Generating Random Directions 

When the chart to be expanded is selected, the expansion point for the atlas is se- 
lected at random. This is achieved by sampling a point uniformly on the surface of 
the ball associated with the atlas and checking if this point is inside the associated 
polytope. If it is the case, the generation of the new atlas proceeds as detailed in 
Sect. [21 

The uniform generation of random points, u/ , on a the surface of a ^-dimensional 
ball is done by generating the point elements according to a normalized one- 
dimensional Gaussian and scaling the resulting vector to norm r ifTOl . 

To verify if the point is inside the associated chart's polytope, we exploit the fact 
that convex polytopes are defined as the intersection of ^-dimensional hyperplanes. 
Thus, for a point u/ = {mi , . . . , m^.} to be inside the polytope V, made of m,- faces, it 
must fulfill 

k 

/o + El^«.v>0, (10) 

.s = l 

for all the faces /' = {Yq,--- ,Yi,),t — 1,. . . ,m,- defining 'P,-. 

If the point is inside the polytope, it is approached through small incremental 
steps of size 5. The intermediate points are successively projected on the manifold 
using Eqs. Q and (|5| and then checked for collision. The last collision-free config- 
uration is used as a root for a new chart. This adjusts the distribution of charts to the 
free configuration space. If no progress at all can be done towards the target point, 
the expansion is declared as failure and the chart under expansion is penalized by 
increasing «,-. Since collisions are not checked between intermediate points, 5 has 
to be set small enough so that only minor interpenetrations could occur. 

Observe that while the chart to extend is selected greedily, the exact expanding 
direction is selected randomly, favoring the exploration of alternative paths in the 
presence of obstacles. 



3.3 Algorithm 

Algorithm [T] corresponds to the HC-Planner, implementing the path planning ap- 
proach introduced in this paper. The algorithm takes x, and x^ as start and goal 
configurations respectively, and tries to connect them with a path on the variety im- 
plicitly defined by a given set of constraints F, as expressed in Eq. ([T]l. The process 
begins by initializing two charts associated to the two query configurations (lines 1 
and 2). Each chart includes the root point x, the base of the tangent space O, the 
ball B and the polytope V limiting the area of applicability of the chart. The two 
charts are then included in the initial atlas, A (line 3). To efficiently determine the 
chart with the minimum expected cost, charts are organized into a binary heap. 
Thus, the cost-to-goal of the start configuration is evaluated (line 4) and used to ini- 
tialize the heap (line 5). In lines 6 to 16, a greedy search is performed as described in 
Sect. l3.1l while the two query configurations are disconnected. At each iteration, we 
extract C,, the most promising chart from the heap (line 7) and if the polytope Vi of 
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Algorithm 1: High-dimensional Continuation path planner. 
HC-Planer(x5,Xj,F) 

input : A couple of samples to connect x.,, x^, a set of constraints F. 
output: A path connecting the two samples 

1 C, ^InitChart(xj,F) //C, = {x,,<I',,S„n} 

2 Cg ^ lNITCHART(Xg,F) // Cg = {Xg,(bg,l3g,Pg} 

3 A^{Cs,Cg} 

4 h{s) ^ \\xs-Xg\\ 

5 H ^lNn}iEAP(Cs,h{s)) 

6 while not CoNNECTEDfA,Cj,Cjj do 

d ^EXTRACTMIN(//) // d = {x;, <D;, S;, P,} 

ifP/^S,then 

Cj ^GenerateNewChart(C;,F) 



9 
10 

11 

12 

13 
14 
15 
16 



ifC; = 0then 

h{i)^ph{i) 

H ^AddToHeap(C,-, /!((■)) 

else 

A ^ A u {Cj} u SingularChart(C,-, Cj) 

HJ)^ l|xj-x?|| 

H ^AddToHeap(C;,;!(7)) 



17 RETURN(PATH(A,Cj,C„)) 



this chart still has vertices outside the ball Bt (line 8), we try to extend the atlas with 
a new chart (line 9). If the extension fails (line 10), the current chart is penalized so 
that its chance to be selected for future extension decreases (line 11), and the chart is 
added to the heap with the updated cost (line 12). If the atlas extension succeeds, the 
new chart is added to the atlas, updating the neighboring relations between charts 
(line 14). Next, the heuristic-to-goal is initialized for the new chart (line 15) and 
added to the atlas (line 16). When the goal is reached, a graph search procedure can 
be used to extract the path linking the query configurations via the roots of some 
of the charts in the atlas. Every time a new chart is added to the atlas, we check 
whether the line connecting the roots of the parent and the child charts crosses a 
singularity. If so, the singular point is located and an additional chart is defined such 
that its root is at the singularity and its tangent space is aligned with the branch of 
the variety that does not contain C, and Cj. Function SingularChart (line 14) 
implements this process and returns the new chart or and empty set if there is no 
singularity between C,- and Cj. 

The generation of a new chart from a previous one is presented in Algorithm^ 
We select a point u^' on surface of the ball defined on the tangent space of the input 
chart (line 2), as described in Sect. 13.21 If the point is inside the polytope (line 3), 
i.e. , the point is in the area of influence of the current chart, we proceed to determine 
a point, Xj, adequate to generate a new chart. This point is searched from the root of 



346 



J.M. Porta and L. Jaillet 



Algorithm 2: Generation of a new chart. 



GenerateNewChart(C, , F) 

input : A chart to expand C,- - 
output: A new chart Cy 



{x;,<J>,-,Z?;,'P,}, a set of constraints F. 



1 c 


/ ^ 











2 u/ ^RandomInBall(B,) 


3 if 

4 
5 
6 
7 
8 


u,' e P, then 

e ^True // Small error with respect to C; 

c ^True // Collision-free 

t ^True // Tangent space similar to C; 

while s<r and e and c and ? do 


9 
10 
11 
12 




Xj ^X;+<l);iu//r 

Xj ^ PROJECT(C,-,Xy,F) 

if xy — Xjll > CT then 

1 s^False 


13 
14 
IS 




el 


se 
if CoLLisiON(x;) then 
1 c ^False 


16 
17 
18 
19 






el 


se 

<bj ^TANGENTSPACE(xy,F) 

if <D|<1)j < 1-CT then 
|_ t ^False 


20 
21 

22 








e 


se 

Cj^InitMap(x;,F) // 


23 R 


ETU 


RN( 


Cy) 







{x;,*i,Sj,Pj} 



the chart under expansion, progressively moving to the target point with incremen- 
tal steps of size 5. At each step, we project the point from the tangent space to the 
manifold (lines 9-10), implementing Eqs. Q and (|5|. If the projection converges 
to a point in the manifold, we check whether the obtained point is too far away 
from the tangent space (line 1 1), whether it is in collision (line 14), and whether the 
tangent space at the new point, computed using Eq. (|2ll, and that of C, are too dif- 
ferent (line 1 8). In any of these cases, the progress towards the new point is stopped 
(lines 12, 15, and 19) and we return the chart for the last valid point (line 23), if any. 
The main operations of the HC-planner scale as follows. The initialization of a 
chart scales with 0(n^ + 2'''), with n the dimensionality of the ambient space and k 
the dimensionality of the configuration space, since we use a QR decomposition to 
identify a base of the kernel of the Jacobian of F and we have to define a box with 2*^ 
vertices. The initialization of the heap is 0(1) and the extraction and removal of its 
minimum element is O(^). The generation of a new chart scales with O^tr" + A:2*) 
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since it includes the five following steps: 1) the generation of a random number on 
a ^-dimensional ball that is 0{k); 2) the check to determine if a point is inside a 
fe-dimensional polytope, that scales as 0{k2^) since each face is defined by a A: + 1 
dimensional vector and the number of neighbors of a chart grows with the kissing 
number that is 0{2^); 3) the projection of a point from the tangent space to the 
manifold, a Newton process with a bounded number of iterations, where at each 
iteration we use a QR decomposition that is 0{rr')\ 4) the determinant of a matrix 
of size k, 0{k^), that comes from the product of m x ^ matrices that is 0{k^ n); 
and, finally, 5) the initialization of a chart, 0{rr' + 2^). When adding a chart to the 
atlas, we have to look for neighboring charts. This can be done in 0{k) since it is 
logarithmic with the number of charts that, in the worst case scales exponentially 
with k. For the neighboring charts, we have to crop the corresponding polytopes. 
This operation scales with the number of vertices of those polytope which is 0{2^). 
Finally, the addition of an element to the heap is 0{k), in agreement with the cost of 
determining the neighboring relations between charts. 

Summarizing, if / is the number of charts needed to connect the start and the goal 
the overall algorithm scales with 0{l {rP +k2'')). In very constrained problems, as 
the ones we consider, k <^n and the cost is dominated by 0{lrr'). In the worst case, 
the final atlas might include all the possible charts for a given manifold and / is 
exponential in k and independent of n. However, as we show in next section, many 
problems require in practice a limited number of charts to connect the start and goal 
configurations. 

Note that the planner is resolution complete, in the sense that by taking a radius r 
small enough we can ensure to find a solution path if it exists. In particular, in 
problems involving narrow passages of minimum width v, setting r < xi/2 would 
ensure a solution. However, in practice, much larger radius can be used safely. 



4 Experiments 

We implemented in C the higher dimensional continuation tool^ described in 
Sect. [2] and the HC-planner described in Sect. [3] including the treatment of con- 
figuration space singularities. These tools were integrated as modules of our posi- 
tion analysis toolbox |32| using SOLID |3 34| as a collision detector and the GNU 
Scientific Library for the lineal algebra operations. Our position analysis toolbox 
is based on a formulation that yields a system of simple equations only contain- 
ing linear, bilinear, and quadratic monomials, and trivial trigonometric terms for the 
helical pair only |21|. The simplicity of the final system of equations makes it ad- 
vantageous for continuation methods |37|. For the purpose of comparison, we also 
implemented the RRT for constrained spaces presented in |9|. In this RRT, points 
are sampled in the ambient space and the nearest sample on the variety is progres- 
sively extended towards the random sample. At each extension step, the points are 
projected to the variety using the Jacobian pseudo inverse method. In our implemen- 
tation, the nearest-neighbor queries use the kd-tree described in ||4T1 . The maximum 



An implementation of these tools tailored for dynamical systems is available in 11331 . 
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Fig. 5 The four benchmarks used, a Star-shaped planar manipulator with three fixed extrem- 
ities, b Two-arms manipulator moving an object from one gap to another c Rotational-only 
parallel manipulator d Cyclooctane molecule. 



number of nodes in the RRT is set to 75000. Experiments were executed on a In- 
tel Core 2 at 2.4 Ghz running Linux. Finally, the algorithm parameters were set to 
r = 0.4, 5 = 0.04, a = 0.1, and /3 = 1.1 for all the experiments. 

Figure |5] shows the four benchmarks used in this paper. The first one is a planar 
star-shaped manipulator also used in ["28 1. In this case, obstacles are not consid- 
ered. The second problem involves a system where two arms have to cooperate to 
move an object from one gap to another. This problem previously appears in ifTTI . 
The movement between the start and goal configurations requires to traverse actu- 
ator singularities, which makes the problem unsolvable by basic Kinematics-based 
Roadmap approaches ll8l fT2l . The third example, kindly provided by Juan Cortes, 
is a parallel platform with rotation motion only. The task here is to move a stick 
attached to the robot across some obstacles. The last benchmark is the cyclooctane. 
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Table 1 Dimensionality of the ambient and configuration spaces, execution times and 
number of samples/charts used by a RRT and the HC-planner 











RRT 




HC 


RRT/HC 


Benchmarks 


k 


n 


Time 


Samples 


Time 


Charts 


T/T S/C 



Star-shaped 


5 


18 


3.87 


5515 


0.47 


199 


8.23 


27.71 


Two-arms 


3 


10 


22.94 


27130 


0.35 


391 


65.54 


69.38 


Parallel 


3 


27 


36.97 


13400 


0.92 


276 


40.18 


48.55 


Cyclooctane 


2 


17 


6.49 


6813 


0.28 


152 


23.17 


44.82 



a molecule whose kinematics is a 8-revolute loop. Here, we have to find a path be- 
tween two conformations that avoids self-collisions involving carbon and hydrogen 
atoms (depicted in cyan and white in the figure, respectively). 

Table [T] shows the performance comparison, averaged over 100 runs, between 
RRT and the HC-planner. For each of the four benchmarks, the table gives the di- 
mensionality of the configuration space {k), the dimensionality of the ambient space 
in), the execution times and the number of samples or charts used for each method. 
The table also shows execution time ratios (T/T) and the ratio between the number 
samples used in RRT and the number of charts used with the HC-planner (S/C). 
Note that the RRT in the Two-arms test case is unable to find a solution for 25% of 
the cases. In Table [T] the RRT results for this problem correspond to averages for 
the successful tests only. 

The results show that, for this set of problems, the execution time of the RRT 
is more than one order of magnitude higher than that of the HC-planner, except 
for the Star-shaped problem where the HC-planner is only about a 8 times faster. 
This is true despite the generation of samples being much faster than the generation 
of charts. This is so because charts are more powerful since they do not only de- 
scribe the variety on a single point but on a local neighborhood of a point. Thus, the 
HC-algorithm uses in average 40 times less charts than samples used by RRT. 

The advantage of the HC-planner search strategy with respect to a more optimal 
strategy is evaluated by applying a standard A* algorithm, implemented as described 
in the introduction of Sect. [3] A* can not solve the Star-shaped problem with less 
than 10000 charts. This is due to the curse of dimensionality: in a 5-dimensional 
configuration space the number of neighboring charts for each chart is about 40 and 
this results in a large exponential growth of the number of chart to generate even for 
simple problems. The Two-arms test can not be solved with R = 0.4 since with such 
a coarse resolution, almost all the transitions between chart centers are in collision. 
The problem can only be solved with R below 0.25. However, at this resolution the 
number of charts to generate increases to the point that the A* execution time is 
about 20 seconds. Our approach dynamically adapts the root of the charts to the 
distribution of obstacles, avoiding the generation of charts on blocked regions. Fi- 
nally, both the Parallel and the Cyclooctane problems can be solved with A*, but at 
a higher cost than the one used by RRT. In the cases that can be solved, though. A* 
returns a path that, in average, is about half the length of those obtained both with 



350 J.M. Porta and L. Jaillet 

the HC-planner or with RRT. However, the path length for both the HC-planner and 
the RRT could be improved with a smoothing post-process, which is not yet im- 
plemented in our system. Finally, note that A* is able to detect if two samples can 
not be connected because they are in different connected components of the vari- 
ety even in the presence of obstacles: in the worst case the atlas for the connected 
component including the start sample will be completed and if it does not include 
the goal sample the planning can be declared as a failure. The HC-planner trades off 
this completeness for efficiency introducing a randomness that prevents an atlas to 
be completed in the presence of obstacles. 



5 Conclusions 

In this paper, we extended the use of High-dimensional Continuation algorithmic 
tools for path planning applications. Using these tools, we defined a randomized 
path planner for highly constrained systems. The presented planner directly works 
on the configuration space, trying to connect any pair of query configurations with 
a small collection of local charts. The algorithm performance is highly independent 
of the relation between the configuration space and the ambient space. This is in 
contrast with existing sampling algorithms for constrained problems that generate 
samples in the ambient space. The experiments show that our approach can be more 
than one order of magnitude faster than state of the art algorithms. 

The worst case cost of the algorithm introduced in this paper is exponential with 
the dimension of the configuration space, which is in agreement with the cost of 
the best complete path planners |6|. Thus, the algorithm would not scale gracefully 
to high-dimensional problems. Despite this, the use of a greedy search strategy to- 
gether with the randomization allow to solve problems with moderate complexity 
(at least up to dimension 5 in the examples) embedded in even higher-dimensional 
spaces. Problems slightly more complexes than this are also likely to be addressable 
with the presented planner and this includes many interesting problems in Robotics 
and in Molecular Biology |5|. To scale to problems with even larger dimensional- 
ity we could rely on charts with larger area of influence. However this is likely to 
be valid only in almost lineal problems, where the error between the tangent space 
and the solution variety remains small over large areas. Moreover, the use of large 
charts limits the set of problems that can be solved since environments densely pop- 
ulated with obstacles typically require small charts. We would like to explore the 
possibility to define variants of the HC-planner where the role of the resolution is 
minimized in the same way as probabilistic roadmaps overcome the resolution lim- 
itations of cell decomposition methods. Another possibility to explore is to define a 
cost function over the configuration space so that the exploration could be limited 
to areas with low cost ITSl . All these points deserve a more careful evaluation. It is 
also our future endeavor to perform a more thorough experimental and theoretical 
analysis of the proposed algorithm, focusing on the performance of the algorithm 
for different obstacle settings. 
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Residual Dipolar Couplings for High-Resolution 
Automated Protein Backbone Structure 
Determination by NMR 
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Abstract. Developing robust and automated protein structure determination algo- 
rithms using nuclear magnetic resonance (NMR) data is an important goal in compu- 
tational structural biology. Algorithms based on global orientational restraints from 
residual dipolar couplings (RDCs) promise to be quicker and more accurate than 
approaches that use only distance restraints. Recent development of analytic expres- 
sions for the roots of RDC equations together with protein kinematics has enabled 
exact, linear-time algorithms, highly desirable over earlier stochastic methods. In 
addition to providing guarantees on the number and quality of solutions, exact al- 
gorithms require a minimal amount of NMR data, thereby reducing the number of 
NMR experiments. Implementations of these methods determine the solution struc- 
tures by explicitly computing the intersections of algebraic curves representing dis- 
crete RDC values. However, if additional RDC data can be measured, the algebraic 
curves no longer generically intersect. We address this situation in the paper and 
show that globally optimal structures can still be computed analytically as points 
closest to all of the algebraic curves representing the RDCs. We present new algo- 
rithms that expand the types and number of RDCs from which analytic solutions are 
computed. We evaluate the performance of our algorithms on NMR data for four 
proteins: human ubiquitin, DNA-damage-inducible protein I (DinI), the Z domain 
of staphylococcal protein A (SpA), and the third IgG-binding domain of Protein G 
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(GB3). The results show that our algorithms are able to determine high-resolution 
backbone structures from a limited amount of NMR data. 



1 Introduction 

Understanding the structures of biologically important proteins is one of the long- 
term goals in biochemistry. While automation has advanced many other aspects of 
biology, three-dimensional (3D) protein structure determination remains a slower, 
harder, and more expensive task. The speed at which protein structures are being 
discovered today is, for example, several orders of magnitude slower than that of 
gene sequencing. 

Two established experimental approaches for protein structure determination are 
X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. While 
crystallography can provide high resolution structures when good quality crystals 
can be grown, NMR spectroscopy is the only experimental technique currently ca- 
pable of measuring atomic-resolution geometric restraints and dynamics of proteins 
in physiologically-relevant solution state conditions. This sets a challenging goal in 
computational structural biology: design efficient automated structure determination 
techniques that exploit geometric restraints collected using NMR. 

Even with recent advances in reducing the acquisition time for NMR data |[T1 
it still remains advantageous to be able to determine protein structure from fewer 
experiments [2]. Therefore, to design successful computational approaches the in- 
terplay between all of the components of the structure determination process should 
be taken into account: the time and cost of NMR data acquisition, the types of NMR 
data and their information content, the algorithmic complexity of the problem of 
computing the 3D structure, and the accuracy of the obtained solution. 

Previous algorithms for structure determination problem by NMR can be divided 
into three categories: stochastic search, systematic search, and exact algorithms. The 
first category includes many widely-used approaches |[3]|4l|5]|6ll2l[ll> which per- 
form stochastic search over possible structures, scored according to their agreement 
with experimental data. These methods suffer from well-known pitfalls, such as lo- 
cal minima, undersampling, non-convergence, and missed solutions. While stochas- 
tic methods may perform adequately in data-rich settings, collecting a large number 
of NMR spectra increases the time and cost of the experiments, and still provides 
no guarantee on the quality of solutions. 

The second category involves systematic grid search over possible structures 
ll9l [T0lfTnfT2l and scoring according to the experimental data fit. Excessive compu- 
tational cost and undersampling due to insufficient resolution are the limitations of 
these methods. 

The development of sampling methodology was influenced by the computa- 
tional complexity of the structure determination problem using the nuclear Over- 
hauser effect (NOE) data. The recent introduction of residual dipolar couplings 
(RDCs) ifTSl fT4l has enabled novel attacks on the problem. In contrast to NOEs, 
which represent local distance restraints, RDCs measure the global orientation of 
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internuclear vectors. While many RDC-based methods still use stochastic search 
Il5l[l6l|9l[Ill[l8l|l2l[l9l[l0l despite its pitfalls, exact algorithms EOIETIIZI have 
recently emerged. These methods explicitly represent the RDC equations as well as 
protein kinematics in algebraic form to compute structures that optimize the fit to 
the RDC data. The exact algorithms not only guarantee completeness and polyno- 
mial running time, they also use a sparse set of RDC measurements (e.g. only two or 
three RDCs per residue), which reduces the time and cost of collecting experimental 
data. 

Implementations of these methods determine the solution structures by explicitly 
computing the intersections of algebraic curves representing discrete RDC values 
and protein kinematics ll20l 1211 121. However, if additional RDC data is measured, the 
algebraic curves representing the RDCs no longer generically intersect. Even though 
collecting additional experimental data may improve convergence of stochastic 
structure determination methods, development of efficient new exact methods is 
needed to handle this scenario. In this paper, we present algorithms that expand 
the types and number of RDC data from which analytic solutions are computed. 
When additional orientational restraints can be measured, the globally optimal struc- 
tures are calculated as points closest to all of the algebraic curves representing the 
RDC constraints. Therefore, the structures that optimally agree with the collected 
experimental data are determined analytically. 

Moreover, with additional RDC data our algorithms can compute structures of 
loops, more challenging regions in a protein for structure determination compared 
to secondary structure elements (SSEs), such as a-helices and /3-sheets. Loops in- 
crease computational complexity of the exact algorithms, because there is less phys- 
ical constraints on the structures that are possible for loops to assume, as is reflected 
in their Ramachandran statistics Il22ll . Therefore, we expand the domain of appli- 
cability of exact algorithms to structure determination problems which in practice 
only stochastic methods could handle before. This extends the benefits of analytic 
solutions into a new and important domain. 

In summary, the following contributions are made in this paper: 

• We present a general framework for computing protein backbone structures from 
sparse RDC measurements. 

• We describe new algorithms that handle two particular types of RDC data: two 
RDCs per residue in one or two alignment media, and multiple RDCs per residue 
in multiple media. 

• We present the analysis of our methods in terms of the upper bound on the 
number of optimal structures the algorithm can generate. 

• We evaluate the performance of our methods on NMR data sets for four pro- 
teins: human ubiquitin, DNA-damage-inducible protein I (DinI), the Z domain 
of staphylococcal protein A (SpA), and the third IgG-binding domain of Protein 
G (GB3). 

We start with background on RDCs in Section |2l We formally define the protein 
structure determination problem in Section[3] Section|4]presents the general frame- 
work for our exact algorithms, as well as detailed explanations of the methods that 
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handle two different types of RDC data (Sections 14. II and l43 . We present a study 
on the performance of our methods in Section|6]and conclusions in Section[7l 



2 Background 

The Physics of RDCs. During an NMR experiment, a protein in solution is placed 
in a static magnetic field. The magnetic field interacts with the nuclear spins of 
atoms of the protein, which allows collecting various NMR measurements. 

Consider a vector v between two NMR-active nuclear spins, a and b, of two 
atoms, and a vector representing the gradient of the magnetic field B in Figure[T] An 
RDC experiment detects the interaction between the nuclear spins. This interaction 
is measured in units of Hertz and can be expressed as 

^^ l^oYaYhfi / 3cos^e-l \ 

in which D is the residual dipolar coupling measurement, /io is the magnetic perme- 
ability of vacuum, h is the reduced Planck's constant, y is the gyromagnetic ratio, 
r^^b is the distance between the two spins, 6 is the angle between the internuclear 
vector, V, and B. The angle brackets represent an average over time and an ensemble 
of proteins in solution. 

Intuitively, as the protein tumbles in solution, the internuclear vector tumbles 
with the protein. The RDC measurement represents the time average over such 
movements. This measurement is of little use if the protein tumbles freely (isotropi- 
cally), since the average is zero and hence no RDC value is detected. An anisotropic 
solution is used to constrain the motions of the protein by adding an alignment 
medium to the solution. The resulting RDC measurement represents the relation- 
ship between the internuclear vector and the set of parameters associated with the 
alignment medium, known as the Saupe matrix ||23]| . or alignment tensor. 




Fig. 1 For an internuclear vector v between two NMR-active nuclear spins a and b, and a 
vector representing the gradient of the magnetic field B, an RDC experiment detects the in- 
teraction between the nuclear spins. This interaction is dependent on the internuclear distance 
r^ ^ as well as the angle 9 between vectors v and B. 
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The Tensor Formulation of the RDC Equation. A more convenient form of the 
RDC equation ([T]l for computing the geometric structure of a protein can be obtained 
after a series of algebraic manipulations 1 2 1 : 

D - DmaxV^Sv, (2) 

in which Dmax is the dipolar interaction constant, v is the unit internuclear vector, 
and S is the Saupe matrix |23| corresponding to the alignment medium used in the 
NMR experiment. 

To derive Equation ^ from ([T]l, a transformation to the molecular coordinate 
frame associated with the protein internuclear vector v is made ||201[2T1 . In such 
a coordinate frame, v is static, and the magnetic field vector, B, tumbles around v. 
The time and ensemble average over such tumbling of Z? is represented by the Saupe 
matrix, which we discuss later in this section. 

Note that Equation (O directly relates an RDC value to the orientation of v in the 
molecular coordinate frame. A sufficient number of such equations allows computa- 
tion of all internuclear vector orientations in a protein with respect to the molecular 
coordinate frame. In the rest of the paper, by denoting r = D/D„uix, we will work 
with the normalized RDC equation: 

r = v'^Sv. (3) 

Alignment Tensors. The Saupe matrix in Equations ^ and Q is a 3 x 3 symmetric 
and traceless matrix. It contains 5 degrees of freedom, 3 of which correspond to a 
3D rotation, and 2 are eigenvalues. 

Each alignment tensor can be diagonalized as S = R^SR, in which R is a 3D rota- 
tion matrix, and the traceless diagonal matrix S has eigenvalues {Sxx,Syy,Szz) ,Szz = 
—Sxx — Syy. Then, the RDC equation Q becomes 

r^Sx.^ + Syyy^ + S,,z^, (4) 




Fig. 2 The RDC sphero-quartic curves in the principal order frame for the alignment tensor 
(Si-vj^vv, S;,) . The unit internuclear vector v that satisfies the RDC equation (|4} is constrained 
to the sphero-quartic curves on the unit sphere. 



360 



A. Yershova et al. 



in which Rv = {x,y,z)- The coordinate frame that diagonalizes the Saupe matrix is 
called the principal order frame (POF) of the alignment medium. 

To design exact algorithms, we consider algebraic representations of solutions. 
Figure 12] shows the solutions to the RDC equation for {x,y,z) as sphero-quartic 
curves on a sphere in the principal order frame. These curves are the intersection 



of the unit sphere x 
tion (HI. 



-y 



1 and a hyperboloid representing the RDC equa- 



Protein Geometry and Assumptions on Dynamics. A protein is modeled as a 
collection of peptide planes. Each peptide plane contains the bond vectors: C"-C', 
C'-N, C'-O, N-H'^ and N-C". Peptide planes are joined through the bond vectors 
N-C" and C"-C', with two torsional degrees of freedom. Dihedral angles, (j) and i//, 
parametrize these joints (see Figure[3ll. The bond vectors C"-H" and C"-C'^ do not 
belong to peptide planes and form a fixed tetrahedral geometry with both N-C" and 
C"-C'. Often, it is convenient to divide a protein into residues. Most of the residues 
(excluding glycine and proline with different geometry) contain consecutive atoms 
H, N, C", H", C'^, C' and O, as shown in Figure[3l The backbone of a protein is a 
sequence of repeating C", C', and N atoms. 

There exist NMR techniques to measure RDCs on the C"-H«, C"-C', C"-C'^, 
N-H'*^, and C'-N bond vectors as well as on C'-H^ internuclear vector. 

Crucial assumptions about the dynamics of the protein are often necessary to 
solve for the large number of unknown variables in RDC equations. Initially, neither 
internuclear vectors v nor Saupe matrices S are known in the structure determina- 
tion process. The assumption that the protein fluctuates in a small ensemble about 
one principal mode, that is, the protein is more or less rigid, leads to a simplified 
dynamics model in which the gradient vector B of the magnetic field is assumed to 
tumble similarly with respect to each internuclear vector. We work with this model, 
because it is applicable to many proteins. However, when the protein is known to be 




Fig. 3 Protein geometry. Two peptide planes P(i) and P(i+1) and the atoms of the correspond- 
ing residue i are shown. Two torsional degrees of freedom of the backbone are represented 
by dihedral angles (j) and l//. 
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Table 1 A 0-defining RDC is used to compute the backbone dihedral 0, and a (//-defining 
RDC is used to compute the backbone dihedral (|/. 



-defining 


pa_jja C'^-C! C"-C^ 


i//-defining 


N-H'^,C'-N,C'-H'^ 



more flexible in solution, additional parameters representing this flexibility must be 
integrated into the model. 



3 The Protein Structure Determination Problem 

The protein structure determination problem considered in this paper is: given sparse 
RDC data for a protein, determine the alignment tensor(s) and dihedral angles of 
the backbone that produce the best fit to the experimental data as well as represent 
a valid protein model. 

Sparse RDC Data. We consider the case of experimental data, containing as few as 
two RDCs per residue in one or two media. To ease the presentation of our meth- 
ods, we differentiate between RDC measurements based on their location within 
a residue. Denote RDC measurements on internuclear vectors C"-H", C"-C', and 
C-C^ as (^-defining RDCs, and N-H'^, C'-N, and C'-H^ as xiz-defining RDCs (Ta- 
ble[T]l. In this paper we design exact algorithms for the following types of RDC data: 

(a) One (/)-defining RDC and one i//-defining RDC per residue in one or two media; 

(b) Multiple (^-defining RDCs and multiple (//-defining RDCs per residue in one or 
more media. 

In general, it may not be possible to record a complete set of such RDC data for 
the entire protein. When we describe our methods in Section |4l we assume that all 
of the RDC values are present for a protein fragment. We describe how we deal with 
some cases of incomplete data in Section|5] 

Data fit. The data fit function that we use is RDC root mean square deviation 
(RMSD). Denote Sj,j= l,...,m, all of the alignment tensors involved, and 0,, i//,, / = 
l,...n, the dihedral angles for residues 1 through n in the protein backbone. The 
RDC RMSD for given alignment tensors and dihedral angles is computed using the 
equation: 



a({S,}7=„ {(/.,■, V/,}Li) 



\ 



ji:irt~rtr^ (5) 

' k=l 



in which / is the total number of RDCs for all residues, r^ is the experimental RDC, 
and r'^ is the RDC value back-computed from the alignment tensors and the structure 
defined by the dihedral angles computed using Equation (O. 

Validation. To ensure that the solution structure is biologically meaningful, we val- 
idate it according to two criteria: Ramachandran regions ll22l . and van der Waals 
packing ll24l . 
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Next we proceed to the methods which solve the protein structure determination 
problem defined in this section. 



4 Methods 

The key to our methods is the use of explicit representation of the protein kinematics 
incorporated into the RDC equations. 

At this point, it is useful to introduce several notations. Without loss of gen- 
erality, we choose the principal order frame of Si (POFi) as the global coordinate 
frame. Within this coordinate frame. Si is diagonal, with eigenvalues Si_xx, Si^yy, and 
(^■^ivx ^ '^i.vv)- We denote the diagonal components of any other alignment tensor, 
Sj, as Sj,xx and Sj,yy. Rj,o denotes the orientation of the principle order frame of 
alignment tensor S^ in POFi . 

Within a protein fragment, there will be several coordinate frames associated with 
different internuclear vectors in the portion, and related to each other through the 
protein kinematics. Our algorithms keep representations of these frames in POFi . 

Consider a coordinate frame defined at the peptide plane Pi with z-axis along the 
bond vector N(/) — *■ H^(() of residue /, in which the notation a ^ b means a vector 
from the nucleus a to the nucleus b. The y-axis is on the peptide plane / and the angle 
between the v-axis and the bond vector N(() -^ C" (/) is 29. 14° as described in ||20]| . 
The jc-axis is defined based on right-handedness. Let R, p denote the orientation 
(rotation matrix) of Pi with respect to POFi . Then, Ri p denotes the relative rotation 
matrix between the coordinate system defined at the first peptide plane of the current 
protein portion and the principal order frame of the alignment tensor Si . We call it 
the orientation of the first peptide plane. 

Our methods follow the following framework. First, the diagonal and rotational 
components of the alignment tensors and the orientation of the first peptide plane, 
Ri/>, are estimated using methods described in Section [S] Next, the branch-and- 
bound tree search algorithm shown in Figure|4]is called, which computes the dihe- 
dral angles of the protein portion backbone. The tree encodes all the solutions to the 
system of RDC equations together with the kinematic equations that relate consec- 
utive internuclear vectors in the protein backbone. Depending on the types of RDC 
data, the branch-and-bound search performs different computations. We cover these 
differences in Sections [4. Ilandl4l2l 



4.1 Exact Solutions for Peptide Plane Orientations from One 
(j) -Defining RDC and One yr-Defining RDC 

In this section, consider the case of experimental RDC data that only contains one (j)- 
defining and one i//-defining RDC measurement per residue. This covers, for exam- 
ple, the case of having RDC measurements for C"-H", and N-H^ in one medium. It 
also covers having C"-H" RDC in one medium and N-H'^ RDC in second medium. 
Next, we describe the inductive step of the branch-and-bound tree search algorithm 
in Figure m 
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BRANCH-AND-BOUND (Si , . . . , S,„, R,./.) 

Input: Global orientation of the coordinate frame for the current internuclear vector, 

the RDC data for the next internuclear vector, and the alignment tensor(s) 

Output: Those dihedral angles that represent valid protein structures and 

best fit to the experimental data 

Branch: Use methods from Sections [4. ll and l4!2l to solve RDC equations. 

Each solution is a dihedral that represents a child node. 
Bound: Prune invalid children nodes based on: 

RDC RMSD, Ramachandran regions, and van der Waals packing 
Recurse: Compute global orientation of the coordinate frame for each valid 

child node, R,+i.p, call BRANCH-AND-BOUND (Si , . . . S„,, R,+ i,/.) 

Fig. 4 The outline of the branch- and-bound tree search algorithm. The tree encodes all the 
solutions to the system of RDC equations together with the kinematic equations that re- 
late consecutive internuclear vectors in the protein backbone. The algorithm systematically 
searches through these solutions and prunes those that do not satisfy the bound conditions. 
Different types of RDC data require different branch-and-bound criteria, which we cover in 
Sectionsl4niandl4r2l 

The algorithm uses R, /> to derive R,+ i,/> inductively after it computes the back- 
bone dihedral angles (pi and i//,. R,+ i,/> is then used to compute the dihedrals of 
the {i+ ly peptide plane. The angles 0, and i//, are computed using the following 
propositions. 

Proposition 1. Given the diagonalized alignment tensor components Si _xx (ind Si ^y, 
the orientation of the i peptide plane R,',/>, and a ^-defining RDC, r, for the corre- 
sponding internuclear vector, v, there exist at most 4 possible values of the dihedral 
angle (pi that satisfy the RDC. The possible values of (pi can be computed exactly and 
in closed form by solving a quartic equation. 

Proof. Let the unit vector vq = (0,0,1)^ represent the N-H'^ bond vector of residue 
i in the local coordinate frame defined on the peptide plane /",-. Let v = {x,y,zY 
denote the internuclear vector for the (^-defining RDC for residue / in the principal 
order frame. We can write the forward kinematics relation between v and vo as 

v = R;,/.R,R,((/),)R,-vo, (6) 

in which R/ and R, are constant rotation matrices that describe the kinematic rela- 
tionship between v and vq. R,-((^;) is the rotation about the z-axis by (pi. 

Let c and s denote cos (pi and sin</),, respectively. Using this while expanding 
Equation Q we have 

x=AQ+Aic+A2S,y = BQ+Bic + B2S, z = Cq + Cic + C2S, (7) 

in which A,,Z?,, C, for < ( < 2 are constants. Using Equation (O in the RDC equa- 
tion ([4| and simplifying we have 

Kq + Kic + K2S + K^cs + K4C^ + Kss^ = 0, (8) 
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in which ^,, < / < 5 are constants. Using half-angle substitutions 

(pi 1 — u^ 2m 

M = tan — , c==-— -^, andi=-— -^ (9) 

in Equation ([8]) we obtain 

g{u) ^Lq+Liu + L2U^+Lo,ic' +Uu^ = 0, (10) 

in which L,, < ; < 4 are constants. Equation (fTOt is a quartic equation which can 
be solved exactly and in closed form. Let {u\ ,U2,uj,U4} denote the set of four real 
solutions (at most) of Equation ( fTOl l. For each m, the corresponding dihedral angle 
0, can be computed using Eq. Q. D 

Proposition 2. Given the diagonalized alignment tensor components Si ,xx and Si,yy, 
the orientation of the i ' peptide plane R,_/>, the dihedral (pi, and a Xji-defining RDC, 
r, for the corresponding internuclear vector, \' , on the peptide plane Pi+i, there 
exist at most 4 possible values of the dihedral angle I//, that satisfy the RDC. The 
possible values of l//,- can be computed exactly and in closed form by solving a 
quartic equation. 

Proof. After representing the internuclear vector v' through v^ using protein 
kinematics: 

v' = R,-,/. R, R,{(Pi) R,- R', R,(i//,-) R;. Vo, (1 1) 

the proof is similar to that in Proposition[Tl since the value of (pi is known. D 

Proposition 3. Given the diagonalized alignment tensor components, the orienta- 
tion of the i peptide plane R/./>, a <p -defining RDC and a Xjl-defining RDC for (pi 
and y/i, respectively, there exist at most 16 orientations,Ri^i,p, of the peptide plane 
Pi+i that satisfy the RDCs. 

Proof. This follows from the direct application of Propositions [T] and |2l D 

Proposition 4. Given the diagonalized alignment tensor components S\^xx Qwcf S\jy 
for medium 1, S2.XX (^nd S2^yyfor medium 2, a relative rotation matrix R2,0' the orien- 
tation of the / peptide plane R,-./), a <p -defining RDC in medium 1 and a \f/-defining 
RDC in medium 2 for (pi and 1//,-, respectively, there exist at most 16 orientations, 
Ri+i./". of the peptide plane P,+i that satisfy the RDCs, which can be computed 
exactly and in closed form by solving two quartic equations. 

Proof. It follows the proof of Proposition [31 once the transformation to the 
principal order frame of medium 2 is made by v' = R^^v to compute the value 
of I//;. D 
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4.2 Exact Minima for Peptide Plane Orientations from Multiple 
<l) -Defining RDCs and Multiple yz-Defining RDCs 

Now, consider the case when additional RDC data has been collected, and more 
than one (/) and i//-defining RDC measurements are available in one or more media. 
This covers, for example, the case of having RDCs for C"-H", C"-C', N-H'^, and 
C'-N in one medium. It also covers the case of having C"-H" and N-H'^ RDCs in 
two media. The inductive step of the tree search in Figure|4]is performed using the 
following propositions. 

Proposition 5. Given the diagonalized alignment tensor components Sj^xx and Sj, yy, 
the rotations between principal order frames, R/.o, the orientation of the f peptide 
plane R,\/>, and multiple (^-defining RDC for the corresponding intemuclear vector 
V of residue i, the global minimum of the RDC RMSD function for v can be computed 
exactly. There exist at most 4 possible values of the dihedral angle (j)i that minimize 
the RDC RMSD function, and such values of (pi can be computed exactly. 

Proof. Let / be the number of RDC equations available for the internuclear vector 
V. The RDC RMSD function for v is a univariate function of (/): 



a(<^) 



\ 



^iid-rtr, (12) 

' k=i 



in which r^ is the back computed RDC value, r^ = y^SjV, for the appropriate align- 
ment medium S;. Similarly to the proof of Proposition^ v can be represented as 

v = R,-,fR;R, ((/),•) R.Vo. (13) 

After denoting cos (pi and sin (pi as c and s, respectively. Equation ( fTSl ) becomes 

x=Ao+Aic+A2S,y = BQ+Bic + B2S, z = Co + Cic + C2S, (14) 

in which A/,fi/,C, for < / < 2 are constants. Substituting x,y, and z into each RDC 
term of Equation (fT2t and using half-angle substitutions we obtain: 



o{u) 



\ 



lhsM)\ (15) 



in which gk{u) are quartic polynomials for each medium k as in Equation [TOl 

Equation (fTSl l defines a univariate function of u that can be minimized exactly, 
by finding zeroes of its derivative function. Let {ui,U2,uj,,U4} denote the set of four 
minima (at most) of Equation ( fTSl ). For each m,- the corresponding dihedral angle <pi 
can be computed using Eq. ([9]). D 

Proposition 6. Given the diagonalized alignment tensor components Sj,xx (ind Sj,yy, 
the rotations between principal order frames, R;,o. t^^ orientation of the { peptide 



366 A. Yershova et al. 

plane R,\/>, the dihedral (pi, and multiple lj/-defining RDCs for the corresponding 
intemuclear vector v' on peptide plane Pi+i, the global minima of the RDC RMSD 
function for y' can be computed exactly. There exist at most 4 possible values of the 
dihedral angle Xj/t that minimize the RDC RMSD function, and such values ofxj/i can 
be computed exactly. 

Proof. The proof is similar to that in Proposition [5] after the transformation as in 
Proposition[2]is used. D 

Proposition 7. Given the alignment tensors {S;}'"^j, the orientation of the peptide 
plane Pi, multiple (^-defining RDC and multiple ijf-defining RDC for 0,- and i//;, re- 
spectively, there exist at most 16 orientations of the peptide plane Pj+i with respect 
to Pj that minimize the RDC RMSD functions for each of the intemuclear vectors. 

Proof. This follows from the direct application of Propositions [5] and |6l D 

Note that the case of data described in this section allows comparing RDC RMSD 
for the branches of the search tree in Figure|4l This enables reducing the size of the 
search tree by pruning the branches based on RDC RMSD, which was not possible 
for the data described in Section l4TT] since RDC RMSD was always 0. Pruning based 
on Ramachandran regions and steric clashes alone is not always enough to compute 
the structures of protein loops. In Section[6]we show that if RDCs in second medium 
are measured, pruning based on RDC RMSD allows computation of loops. 



5 Alignment Tensors, Orientation of the First Peptide Plane, 
and Packing 

In our current implementation we estimate alignment tensor(s) similarly to Il2ni20ll . 
by using singular value decomposition (SVD) ll25l method to fit experimental RDC 
data to the corresponding vectors of an a-helix with ideal geometry. After that, the 
alignment tensor(s) are iteratively refined by using the computed helix structures by 
our exact algorithms. Once the values of the alignment tensor(s) are estimated, other 
fragments of the protein are computed using these values. 

We can use uniform samples over rotation matrices to obtain the orientation of 
the first peptide plane in a protein fragment that result in structures with the best fit 
to the RDC data. For certain types of RDC data, however, the orientation of the first 
peptide plane in a fragment can be computed analytically. 

The resulting running time complexity of our algorithms is linear, and the analy- 
sis is similar to ||2T1 . As described in (^]|, we use a divide-and-conquer approach in 
which the protein is first divided into 0{n) fragments of constant length (typically 
5-14 residues) based on their secondary structure type (a-helix, j3-sheet, loop), and 
then our algorithms from Section |4] are applied to determine the orientations and 
conformations of these fragments. In contrast to 121,1 . in which an algebraic geom- 
etry approach for finding the structure that minimizes the RDC fit for various RDC 
data was described, in this paper we have presented algorithms that achieve the same 
goal, but are simple and practical to implement. 
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In some cases reliable computation of the structure of certain fragments in a 
protein is not possible from sparse RDC alone. This may happen due to missing 
RDCs for certain residues. It may also happen due to the large size of the set of 
possible solutions returned by methods above. To overcome this problem, we use 
a sparse set of distance restraints (NOEs) to assemble the fragments. Our packing 
method lf30l considers all possible discrete translations of the fragments over a three 
dimensional grid (within a parametrized resolution) that satisfy these sparse NOEs. 

We also incorporate sparse unambiguous NOEs to pack /3-sheets. We use ro- 
tamers from the Richardsons' Rotamer Library [49], and model the side-chain NOEs 
to pack the strands while they are being computed using the methods we described 
in the previous sections. A composite scoring scheme is used as a bound criterion 
in the tree search is based on a combination of (1) RDC RMSD, (2) RMS dihedral 
deviation from an ideal strand, (2) hydrogen bond score, i.e., a combination of RMS 
deviation of proton-acceptor distance and RMS deviation of donor-proton-acceptor 
angle violation, (3) score from the steric checker, and (4) NOE RMSD. 

However, when a sufficient number of RDCs is measured for a protein, the divide- 
and-conquer approach is applied to the neighboring fragments sequentially, and, 
therefore, packing of the fragments does not require NOEs. 

6 Results 

We implemented our algorithms in a software package called RDC-ANALYTIC. 
Table [2] shows the results of the application of RDC-ANALYTIC on datasets for hu- 
man ubiquitin (PDB id: lubq |26|, DNA-damage-inducible protein I (DinI, PDB 
id: Ighh 113), Z-Domain of Staphylococcal Protein A (SpA, PDB id: lq2n 1281 ), 
and the third IgG-binding domain of Protein G (GB3, PDB id: lp7e HD). For these 
proteins, the experimental NMR data has been taken from the Biological Magnetic 
Resonance Data Bank (BMRB). For ubiquitin, our program estimates the alignment 
tensors for different sets of RDCs and computes the helix (Ile23-Lys33) conforma- 
tions. The results show that the corresponding alignment tensor components com- 
puted from different sets of RDCs in one medium or two media agree fairly well 
with those computed from ubiquitin NMR structure (PDB id: ld3z). As shown in 
Table m the backbone RMSDs of the helices compared to the X-ray structure (PDB 
id: lubq) and NMR reference structure (PDB id: ld3z) are small. The global folds 
of ubiquitin, DinI and SpA computed from different sets of RDCs and sparse sets of 
NOEs are shown in Figure |5] The results are summarized in Table |2l For ubiquitin 
and DinI we used sparse sets of NOEs which involve only amide and methyl pro- 
tons obtainable from '//-'^C-ILV methyl labehng. We also used C"-C' and N-H"^ 
RDCs. For ubiquitin, the backbone RMSDs between the structures computed our 
algorithm and the reference structures (Model 1 of ld3z, and lubq) is < 1.28 A. 
For, DinI we computed the backbone fold using C"-C' and N-H^ RDCs. Compared 
to the reference structure (Model 1 of Ighh) the backbone RMSD was 1.11 A. 

For SpA we performed three runs of our program. In the first two runs, we used 
C"-H" and N-H'^ RDCs and selected different sets of parameters. For the first run. 
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Table 2 Results of RDC-ANALYTIC. (a) Experimental RDC data for ubiquitin (PDB id: ld3z), 
DinI (PDB id: Ighli), SpA (PDB id: lq2n), and GB3 (PDB id: lp7e) are taken from the 
Biological Magnetic Resonance Data Bank (BMRB). The SSE backbones are computed for 
different combinations of RDCs in one or two media. If RDC measurements in two media 
are collected for a bond vector, we denote it by 2x in the table (e.g. 2xC"-H"). For ubiquitin 
the computed SSEs are compared with both the X-ray structure (PDB id: lubq) and the 
NMR structure (PDB id: ld3z, Model 1). For DinI, SpA and GB3, since only the NMR 
structures are available, we compare our SSEs with Model 1 of the respective ensemble, (b) 
Simulated RDCs obtainted from the reference structure are used, (c) Simultaneous structure 
computation and assembly of /3 -strands into /3 -sheets of ubiquitin and DinI are done using 
13 and 6 NOEs, respectively, which involve only amide and methyl protons obtainable using 
'H-'^'C-ILV methyl labeling, (d) For ubiquitin, DinI and SpA we used 5, 10 and 10 C"- 
C" distance restraints, respectively, to pack the SSEs and obtain the maximum likelihood 
backbone folds, (e) Simulated RDCs from lp7e Model 1 are used only for the missing RDC 
values in the experimental data. 



Protein 


RDCs" used & 


Alignment Tensor(s) 


Backbone RMSD (A) 




RMSD (Hz) 


(Sv,-,S,_-) 


vs. X-ray/NMR structure 


Ubiquitin'" a:23-33 


C"-H°: 1.11,N-H^: 0.740 


15.230, 24.657 


1.276 


j3:2-7, 12-17, 41-45, 65-70 


C-C: 0.129, N-H*"': 0.603 


14.219, 25.490 


1.172 


Dinr" 








a: 18-32,58-72 


C-C: 0.483, N-H^: 1.203 


10.347, 33.459 


1.111 


;8:2-8, 39-44, 49-53 








SpA"' 


(ran 1 ) C° -H« : 0.458,N-H'' : 2. 1 1 


8.008, 23.063 


1.063 


a:8-I7, 24-36, 41-54 


(ran2) C°-H°: 0.678,N-H'"': 0.543 


8.146,24.261 


1.577 




''(ran3)C°-C': 1.237,N-HN: 1.049 


7.676, 22.961 


0.834 


Ubiquitin a:25-31 


2xC"-H": 0.93, 2xN-H": 0.32 


16.9, 23.2; 7.0, 52.4 


0.403 


loop:54-58 


2xC"-H«:2.2,2xN-H'^:0.7 


16.9, 23.2; 7.0, 52.4 


0.409 


loop:59-64 


2xC"-H": 1.9, 2xN-H'^: 1.2 


16.9, 23.2; 7.0, 52.4 


0.652 


loop/^:64-70 


2xC"-H":3.1,2xN-H'^: 1.2 


16.9, 23.2; 7.0, 52.4 


0.49 


/i:2-7 


2xC«-H'':2.6, 2xN-HN: 1.4 


16.9, 23.2; 7.0, 52.4 


0.64 


J3:I1-17 


2xC''-H'':2.6,2xN-H'^: 1.5 


16.9, 23.2; 7.0, 52.4 


0.50 


j3:41-55 


2xC"-H«:2.2,2xN-HN:0.8 


16.9, 23.2; 7.0, 52.4 


0.44 


GB3<'1 a/loop:23-39 


2xC''-H«: 1.7, 2xN-H'^: 1.6 


47.0, 19.2; 23.8, 12.6 


0.35 


loop/p:39-51 


2xC''-H":0.9, 2xN-H'^: 1.3 


16.9, 23.2; 7.0, 52.4 


0.49 


loop//3:51-55 


2xC''-H'':0.7, 2xN-H'~':0.5 


16.9, 23.2; 7.0, 52.4 


0.54 



the backbone fold computed by our algorithm is within 1 . 1 A of the reference struc- 
ture (Model 1 of lq2n). For the second run, we used a narrow sampling interval 
for N-R^ RDCs, and as a result the N-H'^ RDC RMSD of the structure computed 
was better than that for the structure computed in the first run. However, when we 
packed the SSEs computed from the second run and then compared the resulting 
backbone fold with the reference structure (Model 1 of lq2n), the backbone RMSD 
was 1.58 A, slightly higher than that from the first run. We found that when the first 
two helices (Glu24-Asp36, Ser41-Ala54) are compared with the reference structure, 
the backbone RMSD was 0.72 A. The N-H'^ RDCs are missing for Glu8 and Gln9 
that define the first two residues of the helix Glu8-Leul7, which probably led to 
somewhat poor conformation for this helix. We then simulated the C"-C' and N-H^ 
RDCs using lq2n Model 1. Using these simulated RDCs, we computed the global 
fold of SpA during the third run. When compared with the reference structure, the 
backbone RMSD was 0.83 A. 
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Fig. 5 The global folds of ubiquitin (lA), DinI (2A) and SpA (3A), computed by RDC- 
ANALYTIC, using C"-C' and N-H^ RDCs and a sparse set of NOEs. For ubiquitin and DinI 
experimental C«-C' and N-H'^ RDCs are used. For SpA simulated C«-C' and N-B^ RDCs 
are used. (IB) Overlay of the ubiquitin global fold (green) computed by RDC-ANALYTIC with 
the X-ray structure (red). The backbone RMSD is 1.17 A. (2B) Overlay of the global fold of 
DinI (green) computed by RDC-ANALYTIC with the Model 1 (red) of the reference structure 
(PDB id: Ighh). The backbone RMSD is I. II A. (3B) The global fold of SpA computed by 
RDC-ANALYTIC is overlaid on the Model l(red) of the reference structure (PDB id: Iq2n). 
The backbone RMSD is 0.83 A. 



For both ubiquitin and GB3 we applied RDC-ANALYTIC to compute portions 
(including helices, loops and /3-strands) from C"-H" and N-H^ RDCs in two media. 
The results for portions of ubiquitin are reported in Table [J] The resulting overlay 
of the residues 23-55 of GB3 backbone (red, green, and black) computed by RDC- 
ANALYTIC with the NMR reference structure (PDB id: lp7e Model 1 |^) is shown 
in Figure|6l Since the experimental data for GB3 is not complete (28 out of 132 C"- 
H" and N-H'^ RDCs in two media are missing for for residues 23-55), we simulated 
the missing RDCs using the NMR reference structure. We then computed several 
consecutive portions of the protein in a divide-and-conquer fashion. The overall 
backbone RMSD with the reference structure was 1.47 A. 
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Fig. 6 Overlay of the residues 23-55 of GB3 backbone (red, green, and black) computed by 
RDC- ANALYTIC with the NMR reference structure PDB id: lp7e Model 1 1 29 1 (blue). Several 
portions of the protein were computed in a divide-and-conquer fashion. The backbone RMSD 
of the portions are the following residues: 23-39(red) - 0.35 A, 39-51(green) - 0.49 A, 51- 
55(black) - 0.54 A. The overall backbone RMSD with the reference structure is 1.47 A. 



The above tests on both real NMR data and simulated data demonstrate the ca- 
pability of RDC-ANALYTIC to determine high-quality backbone fold. As part of our 
future work, we plan to apply our algorithms to determine backbone structures from 
NMR data collected for larger proteins. 

7 Conclusions 

We developed algorithms for protein structure determination using residual dipolar 
couplings collected by solution-state NMR. Our algorithms take into account differ- 
ent aspects of the structure determination process, such as the time and cost of NMR 
data acquisition, the types of NMR data and their information content, the algorith- 
mic complexity of extracting the 3D structure, and the accuracy of the obtained 
solution. By using RDCs, we reduce the algorithmic complexity of the structure de- 
termination problem to linear time. To reduce the cost of NMR data acquisition, our 
methods use sparse RDC data, specifically, as little as two RDC measurements per 
residue in a protein. We develop exact algorithms to compute analytic solutions that 
optimally fit the NMR data producing high quality structures. 

The key to our algorithms is the explicit representations of the RDC equations 
together with the protein kinematics. Geometrically, this representation results in 
algebraic curves on a unit sphere that may or may not intersect, depending on the 
type and number of RDC measurements collected for a single internuclear vector in 
a residue. Our algorithms find the points on the unit sphere located closest to all of 
the algebraic curves exactly. These points correspond to the dihedral angles of the 
protein that optimally fit the RDC data. 

We tested our algorithms on NMR data for several proteins: human ubiquitin, 
DinI, SpA, and GB3. Previous versions of our algorithms (3U\ (which were not ex- 
act for as many types of RDC data as the new algorithms presented in this paper) 
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have been extensively tested on NMR datasets for different proteins, as well as used 
in a prospective study to solve the structure of the FF Domain 2 of human transcrip- 
tion elongation factor CA150 (PDB id: 2kiq ll30l ). We plan to do more extensive 
experimental tests on different NMR data using the algorithms proposed in this pa- 
per. We also plan to apply our algorithms to solve other new protein structures in 
our future work. 
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LQG-Based Planning, Sensing, and Control of 
Steerable Needles 

Jur van den Berg, Sachin Patil, Ron Alterovitz, Pieter Abbeel, and Ken Goldberg 



Abstract. This paper presents a technique for planning and controlling bevel-tip 
steerable needles towards a target location in 3-D anatomy under the guidance of 
partial, noisy sensor feedback. Our approach minimizes the probability that the nee- 
dle intersects obstacles such as bones and sensitive organs by (1) explicitly taking 
into account motion uncertainty and sensor types, and (2) allowing for efficient opti- 
mization of sensor placement. We allow for needle trajectories of arbitrary curvature 
through duty-cycled spinning of the needle, which is conjectured to make a needle 
path small-time locally "trackable" lfT3] . This enables us to use LQG control to 
guide the needle along the path. For a given path and sensor placement, we show 
that a priori probability distributions of the needle state can be estimated in advance. 
Our approach then plans a set of candidate paths and sensor placements and selects 
the pair for which the estimated uncertainty is least likely to cause intersections with 
obstacles. We demonstrate the performance of our approach in a modeled prostate 
cancer treatment environment. 



1 Introduction 

We consider the problem of planning, sensing, and controlling a bevel-tip steerable 
needle towards a target in 3-D anatomy with obstacles, such as sensitive and impen- 
etrable tissue. Needles are used in many forms of medical diagnosis and treatment, 
and accurately reaching a specific target is required in many procedures such as 
tissue biopsies and placement of radioactive seeds for cancer treatment. Bevel-tip 
steerable needles are asymmetric-tip, flexible needles that move along trajectories 
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of constant curvature Kq when pushed forward f271. The direction of motion can be 
changed by reorienting the bevel tip through twisting of the needle at its base. This 
allows for steering the needle around anatomical obstacles towards previously in- 
accessible targets, and allows for significantly reducing patient trauma by avoiding 
the puncturing of sensitive tissues. 

Planning and controlling the motion of a steerable needle is a challenging prob- 
lem. A steerable needle is controlled from its base through only insertion and twist- 
ing, and we do not allow retractions and re-insertions as that results in excessive 
tissue damage. As such, a steerable needle is a highly underactuated non-holonomic 
system. In fact, the needle is not small-time locally controllable, and a natural nee- 
dle path (with curvature Kq) is not small-time locally "trackable" (i.e., the deviation 
with respect to the path is not small-time locally controllable) |13|. In addition, 
the motion of the needle is subject to uncertainty due to tissue inhomogeneity, tis- 
sue deformation, needle torsion, etc. Il7l l23l . To address this issue, we follow the 
suggestion of Kallem lfT3l that a path with a smaller curvature < K < Kq is small- 
time locally trackable and that this can be achieved using duty-cycled spinning of 
the needle during insertion. This would enable us to use feedback control to guide 
the needle along a pre-planned path. Our experiments suggest that this is indeed the 
case. The sensor feedback, however, may be noisy and partial as current medical 
imaging technology does not allow for measuring the full state of the needle tip (the 
imaging resolution is often too low to infer its orientation, for instance ||T2| . and 
often only provides planar views in real-time feedback situations). 

Our objective is to compute a sensor placement and a needle path to the target 
location, such that the path's execution using LQG control has a minimal probability 
of intersecting obstacles in the anatomy, given a stochastic model of the motion and 
sensing uncertainty. Our approach is as follows. First, we build on work of ITSlI to 
encapsulate the (high-frequency) duty cycled spinning of the needle in a higher-level 
kinematic model that allows direct control of the curvature of the needle motion. 
We then derive an LQG controller (consisting of a Kalman filter for state estimation 
and an LQR control policy) for the extended kinematic model to optimally guide 
the needle along a given path. Based on the sensor placement, we can compute in 
advance the a priori probability distributions of the state of the needle along the path 
II25I . From these distributions and a geometric model of the anatomical obstacles, 
we can quickly compute the probability that the needle will intersect obstacles. Our 
method then plans a set of candidate paths using a variant of the RRT algorithm lfT4l 
and samples a set of feasible sensor placements, and then selects the pair for which 
this probability of obstacle intersection is minimal. 

The type and placement of sensor(s) can have a big influence on which path is 
optimal (see Fig. [T]). For example, if a sensor obtains a 2-D projection along the 
X-axis of the 3-D position of the needle tip, there will be more uncertainty in the 
x-coordinate of the needle state than in the y- and z-coordinates. If we then have to 
steer the needle through a passage that is narrow in either the y- or the x-direction, 
our path planner will prefer the passage for which the probability of intersecting 
obstacles is less, which is the one that is narrower in the v-direction. Given a set 
of candidate paths and the space of possible sensor placements, our approach will 
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Fig. 1 Two examples of sensor placement, in which (left) only the x- and z-coordinate and 
(right) only the y- and z-coordinate of the needle-tip are measured by the imaging device 
(blue). Different paths will be optimal even as the obstacles (grey) and target location (cross) 
are the same. 

choose a needle path and find an axis along which a projection is obtained that 
minimizes the probability of collisions for the chosen needle path. We demonstrate 
the performance of our approach in modeled prostate cancer treatment environments 
with simulated uncertainty for different examples of sensor models. 

The remainder of this paper is organized as follows. We discuss previous work 
in Section [2l and review the kinematic model of a steerable needle in Section [3] In 
Section m we derive an LQG controller to optimally guide the needle along a given 
path. In Section |5] we show how to estimate the probability of obstacle intersection 
for a given path and sensor placement, and present our path and sensor planner. We 
present simulation results in Section|6]and conclude in Section]?] 



2 Related Work 



A significant body of previous work exists on planning and/or controlling bevel-tip 
steerable needles. A kinematic model for a steerable needle generalizing a unicycle 
model was introduced in JZTJI . and has been used by most subsequent work on needle 
steering, including this paper. In lfT6l . it was shown that in addition to the insertion 
and rotation speed, the curvature of the needle path can be controlled through duty- 
cycled spinning of the needle during insertion. 

2-D planners that address motion uncertainty have been presented in {Jim, which 
optimize a Markov decision process (MDP) over a discretized state space to provide 
feedback control assuming full state observation. The approach was extended in ll22l 
and integrated with imaging feedback. In ||T21 . a feedback controller is presented 
to stabilize the needle in a plane. Tissue deformation is taken into account in the 
planner of Ul, which optimizes a path using 2-D FEM simulation of soft tissue. 

Needle path planners for 3-D environments with obstacles have been proposed 
in (8] |9l, based on optimization and inverse kinematics, respectively. Rapidly- 
exploring random trees (RRTs) have been used in ||29l [30l l20ll to explore the entire 
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3D space of feasible paths. These approaches do not address issues such as uncer- 
tainty in motion and sensing. A diffusion-based planner was introduced in flSl, but 
it does not take into account obstacles or sensor types. A feedback controller for 
3-D needle steering was presented in flT], and proved robust against motion devia- 
tion and sensing noise, even for a greedy control policy. The approach is not able to 
guide the needle along a prescribed path and does not address obstacle avoidance. 

The approach we present in this paper extends these previous works and applies 
to 3-D environments with obstacles, takes into account motion and sensing uncer- 
tainty, and does not require discretization of the state-space. Our approach to needle 
steering is the first that specifically addresses the sensing capabilities and its effect 
on optimizing the path. 



3 Needle Kinematics 

We base our motion model of a bevel-tip steerable needle on the idealized kinematic 
model of IZTl . using the nomenclature of [8|, in which the needle state is represented 
by a rigid body transformation. This model assumes that the motion of the needle 
is fully determined by the motion of the tip, which is beveled such that it follows 
a perfect arc of curvature jcq when pushed forward, independent of insertion speed 
and tissue properties. The model further assumes that the needle is flexurally flexible 
(it bends to follow the needle tip), but axially and torsionally stiff, such that the 
insertion and twisting of the needle at its base is directly transmitted to its tip. 

The state of the needle tip can be described by a 3-D rigid body transformation 
relative to a world coordinate frame, which is compactly represented by a 4 x 4 
transformation matrix Z £ SE{3); 



X 



Rp 

1 



(1) 



where R G S0{3) is a 3 x 3 rotation matrix describing the rigid body's orientation, 
and p G R^ a vector describing the rigid body's position. 

The kinematics of a rigid body, i.e. the evolution of its state over time, can gener- 
ally be described a follows. Let v G K^ and w G M^ be the vectors of instantaneous 
linear and angular velocities, respectively, expressed in the local coordinate frame 
attached to the rigid body. Then (using notation ' to refer to the time derivative): 



X'=XU, U 



[w\ V 




(2) 



where 4x4 matrix U G se{3) is the twist of the rigid body. The notation [a] for a 
vector a = [ax % a^]^ G M? refers to the following 3x3 skew- symmetric matrix: 
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-ay 


ax 






(3) 



LQG-Based Planning, Sensing, and Control of Steerable Needles 377 



Fig. 2 Local coordinate 
frame attached to the needle 
tip. Its kinematics are shown 
by yellow arrows. Figure 
reproduced with permission 
from 0. 




VK 1 



Z V 



When the twist U is constant, the state of the rigid body at time / given the initial 
state at time is computed explicitly by integrating Eq. ^, for which a closed form 
expression exists |8l: 

X{t)^X(0)exp{tU). (4) 

For the steerable needle, the local coordinate frame is rigidly attached to the tip of 
the needle such that the z-axis points along the forward direction of the needle, and 
the j-axis points along the bevel direction (see Fig. O. The motion of the needle 
is determined by two control inputs: the linear forward speed (i.e. the speed with 
which the needle is inserted), denoted v, and the bevel orientation speed (i.e. the 
speed with which the needle is twisted at its base), denoted ft). Hence, the linear 
and angular velocities of the needle tip given the control inputs v > and ft) are 
V = [O v] and w = [vKq ft)] , respectively, where Kq is the (fixed) curvature 
of the arc the needle follows through the tissue. 

This needle model is constrained by the fact that the curvature Kq of the needle 
paths is fixed. In recent work, however, Minhas et al. ifTSll show that by performing 
duty cycled spinning of the needle during insertion, any curvature K of the needle 
motion between and Kq can be approximated. The greatest degree of curvature 
{k = K"o) is achieved by no spin at all, while a straight trajectory (k = 0) is cre- 
ated by constantly spinning the needle at a high (infinite) rate during insertion. Any 
trajectory in between these two extrema can be approximated by duty cycling the 
spinning in a spin-stop-spin-stop fashion. Longer stop intervals create steeper cur- 
vature of the needle, and shorter stop intervals create straighter trajectories. To be 
precise, the proportion < a < 1 of the time spent in spin intervals to approximate 
a curvature of < ic < Kq is given by: 

a = 1 - k/kq. (5) 

Let the needle perform a 2k rotation each spin interval, and let the spin intervals be 
of a small and constant duration 5. Then, the period of one spin-stop cycle is 5 /a. 
In order to incorporate the duty cycling into the kinematic model of the needle, the 
control input (o{t) (parameterized by time t) is adjusted to: 

(2n/d + w i{j<t/{5/a)<j + a (spin) 

"^^^~| w i{j + a<t/{d/a)<j+l, (stop) 
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for any j G Z, where w is the higher-level control of the speed with which the needle 
is rotated at its base (at the lower level, it is augmented with rapid In rotations). 

By taking the limit 5^0, the resulting high-level kinematic model is similar 
to the low-level kinematic model described above, with the difference that (O is 
replaced by w, and the curvature K is added to the set of control inputs. The high- 
level twist U given high-level control inputs v, w and K is then given by: 



U = 



[w] V 
1 



v= [0 v]^, 



w 



[vK 



(7) 



To account for the uncertainty the motion of the needle is subject to due to tissue 
inhomogeneity, tissue deformation, needle torsion, etc., we augment the model by 
assuming that the twist is corrupted by additive noise U drawn from a zero-mean 
Gaussian distribution with variance M: 



U 



[w\ V 
1 



:m-^(0,M), 



The stochastic kinematics of the needle state X is then given by: 

X' ^X{U + U). 



(8) 



(9) 



4 LQG Control for Steerable Needles 

Let us be given a needle path FI consisting of states X and control input twists U 
formed from control inputs v > 0, w and < Jc < icq (see Eq. Q), such that 



x'=xu, x = 



1 



u = 



[Vf\ V 





(10) 



That is, the path is consistent with the needle kinematics without noise, and as 
conjectured in |[T3l . the path is small-time locally trackable since < K" < Kq. 

During control of the needle along the path i7, we can assume that we obtain 
potentially noisy and partial observations of the state as feedback from sensors, in 
order to compensate for unexpected needle motion. We assume that this feedback 
will be according to the following (general) sensor model: 



z = /j(X,q), q^^(0,e), 



(11) 



where z is a vector of measurements that relates to the state X through function h, 
and q is the measurement noise drawn from a zero-mean Gaussian with variance Q. 
To control the needle along the needle path FI, we use the LQG-controller, since 
it provides optimal control for linear Gaussian motion and sensor models with a 
quadratic cost function penalizing deviation from the path. The LQG controller uses 
a Kalman filter for state estimation in parallel with an LQR control policy. Since our 



LQG-Based Planning, Sensing, and Control of Steerable Needles 



379 



motion and sensor model are non-linear, we approximate them with local 
linearizations around the path. This is reasonable as the needle is controlled to stay 
close to the path during execution. 

For purposes of linearization, we will not directly control the state itself, but 
rather the deviation of the state with respect to the path. This is also convenient for 
dealing with the 3-D orientation in the needle state, which either has a redundant 
but internally constrained representation (e.g. a quaternion or rotation matrix) or a 
minimal but singularity-prone representation (e.g. Euler angles). Assuming that the 
deviation is small, the orientation deviation is "far away" from any singularities, 
and can hence safely be represented by three mutually unconstrained parameters. 
Whereas many other works use quaternions to represent rotations llT0l[T5ll24l . we 
will linearize for the 4x4 matrix X G SE{3) (roughly following ||4|), and derive a 
Kalman filter and LQR controller for the needle. 

4.1 Model Linearization and Discretization 



We define the state deviation X as the transformation between the path state X and 
the (unknown) true state X, and the twist deviation U as the difference between the 
true control input twist U and the control input twist U along the path: 



X 



1 



x-'x 



R^R F(p-p) 
1 



u 



[w\ V 




u-u. 



(12) 



where the last equality follows from the fact that X ' = [^f, ^ ^] • The kinematics 
of the state deviation, i.e. its evolution over time, is given by 



X' = x-^x' + {x-^yx = x-^x{u + u)- ux-^x 
= X{U + U)-UX= X{U + u + u)-ux = 



/?[w + w + w] — [w]/? /?(v + v + v) 




[wjp- 



R' p' 





(13) 



where the equalities follow from Eqs. ^ and | |T2| |. and the fact that {X ')' = 

-ux-^ ED. 

To get a non-redundant state vector, we represent the orientation deviation R as 
a rotation of angle ||f|| about axis f G M?. Assuming this deviation is sufficiently 
small, it is approximated well by the following first-order Taylor expansion: 



^ = /+[f]. 



(14) 



By substituting Eq. ( fT4l l into Eq. ( fT3l l. and ignoring all second-order error terms, 
we get to first order (using the identities [a]b = a x b = — [b]a and [a] [b] — [b] [a] = 

[a X b]): 
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p = [rjv + v + v- [wjp = -[wjp- [vjr + v + v, 
[fl' = ffl [wl + fw + wl - [wl [fl = f- fwlfl + fw + wl 



Combining Eqs. ( flSl l and ( fT6t , we get in matrix form: 





p 

r 


+ 


V 

w 


+ 


V 

w 



which we can write as 



x' = Fx + Gu + m, m-^(0,M), 



(15) 
(16) 



(17) 



(18) 



where x = [?] , f = [ q rli ] , m is as defined in Eq. ^, and u and G are defined 
as follows: 

"001000' 
00000 1 



V — V 




w — w 


, G = 


VK — vk 





000 100 



(19) 



Let us discretize time into stages of duration T and assume that the control inputs v, 
w, K and V, w, k and variance M are constant for the duration of each stage k. The 
path n then consists of a series of states and control input twists {Xq ,Uo,... ,Xe,Ui), 
where i is the number of stages of the path, such that 

Xk+i^XkexpitUk). (20) 

We can then integrate Eq. (fTSl l to get [IS]: 

Xk+i=Ai,Xk + BkUk + ni„ ni-^(0,Afi), (21) 

where 

Ak^e^^t^ B^ = /%('-')^*Gdf, Nk= f e'^'-'^'''Mke'^'-'^''k dt. (22) 

70 Jo 

Eq. ^T\\ is the linearized and discretized motion model of the deviation of the needle 
state from the path. 

The sensor model (see Eq. (fTTl ') is discretized by assuming that in each stage k 
we obtain a measurement z^^. To relate the state deviation vector x (as opposed to 
the state matrix X as in Eq. (fTTb ) to a measurement z, we define (note that p and f 
are part of x): 

r/+[f] p" 



hk{x,(i}^h{Xu 



1 



(23) 



where we use the fact that X — XX (see Eq. ( IT2t ) to reconstruct the state matrix X 
from the state deviation vector x. Linearizing A^ around the path Tl gives 



ik=HkXk + Wkqk. qk^'^i^.Qk), 



(24) 
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where 

z, = Zi-/j,(0,0), //, = ^(0,0), M4 = ^(0,0). (25) 

ox aq 

Eq. jTM is the Unearized and discretized sensor model of the deviation of the needle 
state from the path. 

4.2 Kalman Filter and LQR Controller 

Eqs. ( l2Tb and ( l24b form a standard linear Gaussian model, of which x is the state, u 
the control input and z the measurement. The Kalman filter keeps track of the esti- 
mate X and variance P of the true state x during control. It continually performs two 
steps; a process update to propagate the applied control input u, and a measurement 
update to incorporate the obtained measurement z: 
Process update step: 

^'k =Ak-iXk-u (26) 

P^- = Ai._iPi_iA[_i +Nk^i. (27) 

Measurement update step: 

Kk = P^hKh^P^-hJ + WkQkW[)-\ (28) 

Xk=Kkizk-HkXi;), (29) 

Pk = (I - KkHk)P^- . (30) 

The Kalman-gain matrices Ki^ can be computed in advance (i.e. before execution) 
given the initial variance Pq, without knowledge of the actual control inputs u and 
measurements z. We refer the reader to ll28l for additional details. 

The LQR controller provides optimal control for a motion model of the type 
given by Eq. (l2Tb [5J. The optimal control inputs Uk are found by minimizing the 
cost function 



mm 



(i(x[Cx, + u[/)a,)), (31) 



which quadratically penalizes deviations from the path 11 through given positive- 
definite weight matrices C and D. Matrix C specifies the cost for deviating from 
the planned path, while D specifies the cost for deviating from the planned control 
input. Penalizing the magnitude of u is reasonable as the linearized motion model is 
only valid when u is small. 

Solving Eq. (l3T1 l gives the control policy u^^ = Lf^x/t, for feedback matrices L^ 
that are pre-computed using a standard recursive procedure (for < ^ < £) |[5]|. As 
the true state x,;. is unknown, the estimate x^^ of the state which is obtained from 
the Kalman filter is used to determine the control input Uk at each stage k during 
execution of the path. Hence, the control policy is u*: = Lj^xi^. We refer the reader to 
Il5l l25l for additional details. 
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The actual control inputs v^, Wk and Kf, that are applied to the needle are found 
using Eq. (fT9t . given u*^ and the control inputs v^, Wk and k"^^ of path JJ. After appli- 
cation of the control input, the Kalman filter produces the estimate of the next state 
from which in turn a new control input is determined. This cycle repeats until the 
execution of the path is complete. 



5 Optimal Path and Sensor Planning 

The first objective is to plan a needle path towards a target location g inside a 
workspace W <ZM? . For simplicity, we assume that the workspace is the rectangular 
region [0,jfmax) x [Ojjmax) X [0,Zmax), and that the needle can enter the workspace 
at any point in the plane z = 0. Further, the workspace may contain obstacles de- 
fined by a region O cW that models impenetrable or sensitive tissue. The second 
objective is to select a placement of the sensor that will provide feedback during 
control. 

The quality measure of a path TI and a sensor placement is the probability that 
the needle will intersect obstacles when the path is executed using LQG control. 
We will first discuss how to compute this probability for a given path Tl and a 
given sensor model, and then discuss how we plan a set of candidate paths and 
sensor placements from which we select the pair that minimizes the probability of 
intersecting obstacles during control. 



5.7 Obstacle Intersection Probability along a Path 

Given the LQG controller for a path Tl and sensor model h (see Section |4|, we can 
analyze in advance how the true state x, and the estimated state x, will evolve dur- 
ing control as functions of each other. The evolution of the true state x, is dependent 
on the estimated state through the LQR control policy and the evolution of the es- 
timated state X, is dependent on the true state through the measurement obtained 
in the Kalman filter. This gives the following equation in matrix form (see |251 for 
more details): 



I 



Alt BkLj^ 

Kk+itiii+\Ak Ai. + BkLk — Kk+iHk+iAk 



(32) 



Ha 

q-t+i 



<lk+i 



'■^{0, 



Nk 

Qk+i 



), 



which we write shorthand -for the appropriate definitions of y^, 1^, Vk, s^ and Z^- 
as (note that Yi^, Vj. and Z^ can all be computed in advance for a given a path i7): 



yA-+ 1 = Ykjk + VkSk, Sa ~ o/K (0, Za ) . 



(33) 



From this, we can compute the mean ja and the variance E^ of y^ = [j ] for any 
stage k of the execution of the path: 
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y^+i = Ykfk, h = 0, (34) 



^k+ 1 = YAY[ + VkZkV[ , Eo = 



'PoO 




(35) 



Note that the mean y^. is zero for all stages t. Hence, [^*] ^ ^{0,Eic). 

Given these a priori distributions of the state deviation, we can compute the prob- 
abihty that the needle will intersect an obstacle during the execution of path il. 
Let Ef be the variance of the position deviation p^, which is the upper-left 3x3- 
submatrix of the 12 x 12-matrix Z^t (note that x = [ -] ). As p = ^^(p — p) (see Eq. 
(fT2l [). we have that p = p + ^p, so the a priori distribution of the position p^^ of the 
needle tip at stage k along path H is given by ,jV {^ii,Ri,E^R^) . Hence, the prob- 
ability pii that the needle intersects the obstacle region at stage k along path Tl is 
computed as: 

„ / exp(-i(b-p.)"(4fP4")'Hb-PA-)) .. ,-^, 

Pk = / , db, (36) 

which is the integral over the obstacle region O of the probability density function 
of Gaussian ^{pk,RkP^Rl). 

Instead of computing the probabilities pk exactly, we will use an approximation 
that can be computed efficiently. To this end, we look at the maximum factor by 
which the ellipsoid of one standard deviation of the a priori probability distribution 
can be scaled such that it does not intersect an obstacle. Let this factor be c^ for 
stage k along the path. For a multivariate Gaussian distribution of dimension n, the 
probability that a sample is outside c^ standard deviations is given by 

Pk^l-r{n/2,cl/2), (37) 

where F is the regularized Gamma function ifTTl . It provides a (conservative) upper 
bound of the true probability of intersecting obstacles at stage k. 

The value of Ck for stage k is efficiently computed by using a collision-checker 
capable of performing distance calculations and linear transformations on the obsta- 
cle geometry, such as SOLID 1261 . Let Ek be a matrix such that EkEj^ — U^. The set 
of positions within one standard deviation is then an ellipsoid centered at the mean 
Pk obtained by transforming a unit sphere by Ek- Transforming the environment by 
Ej^ (such that the uncertainty ellipsoid becomes a unit sphere), and calculating the 
Euclidean distance between pk and the nearest obstacle in the transformed environ- 
ment gives the value of q for stage k, from which the approximate probability pk of 
intersecting obstacles at stage k can be computed using Eq. dJTJ ). 

Assuming (somewhat opportunistically) that the probabilities pk are independent, 
it follows that the probability p{n) that the needle intersects the obstacle region 
anywhere along path TI is given by p{n) — 1 — Ha— o(l ^ Pk)- 
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5.2 Planning a Needle Path and Sensor Placement 

To plan an optimal pair of a needle path and sensor placement, we (randomly) gen- 
erate large sets of possible needle paths and sensor placements, and evaluate the 
probability of intersecting obstacles for each pair. 

To generate a large set of (random) needle paths we use the RRT (random rapidly- 
exploring tree) algorithm fT4\, as it is known to create trees of paths that uniformly 
cover the space and handle kinematically constrained systems, such as steerable 
needles, well. As the target location is a specific point g, and the entry location 
can be anywhere in a pre-defined entry zone, it is convenient to plan backwards 
from the target location. Backward kinematics are identical to forward kinematics, 
except that the (forward) control inputs v, w, and k are integrated over a negative 
time-step. As the actual path traced out by the needle only depends on the ratio vv/v 
and not on the values of the individual terms l^, we set v = 1 cm/s and only vary 
vt> £ [— Wmax, Wmax] and K" G [0, K"o). We will not describe the RRT-planner in detail 
here, but refer to ll20l [29l [30l for details on RRT implementations for steerable 
needles. All the paths in the resulting tree that reach the entry zone are valid. 

Let S^ be the space of sensor placements, and let the sensor model h{X,q) cor- 
responding to a placement s G S^ he denoted hs{X,q). We generate a large set of 
placements by random sampling from ^. We iterate over all valid paths contained 
in the RRT-tree and all placements sampled from S', and select the pair for which 
the needle has minimal probability of intersecting obstacles as computed above. 

6 Simulation Results 

We experimentally evaluate our approach using simulations of the computed can- 
didate paths using LQG control with simulated process and measurement noise. In 
our experiments, we use an anatomical model of the human male pelvic region to 
simulate needle insertion in tissue for delivering radioactive doses to targets within 
the prostate region for cancer treatments. We first show that the needle is control- 
lable along a candidate path. We then show the effect the sensor can have on the 
optimal path, and how the sensor placement can be optimized for a given problem. 
We implemented the system in C-n- and tested it on a 3.33 Ghz 4-core Intel® i7™ 
PC. All experiments utilized only a single core for computation, but our approach 
could be parallelized over multiple cores to yield significant speedups. In our ex- 
periments, we model the needle motion and noise using the following parameters: 
Wmax = 2rc rad/s, T = 0. 1 s, k" = 0.2 cm^' , M is a diagonal matrix with 0.01 (cm/s)^ 
for the position components and 0.05 (rad/s)^ for the rotational components, and Q 
is a diagonal matrix with 0.05 cm^ along the diagonal. 

6.1 Needle Controllability 

We first demonstrate the controllability of the needle along a candidate path using 
an LQG controller with artificially generated process and measurement noise. We 
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Fig. 3 (a) Given a candidate path (gray) from left to right, we illustrate samples (red spheres) 
from 100 simulated executions of the LQG controller and extended Kalman filter showing 
convergence to the path. The simulations included artificially generated process and mea- 
surement noise. The blue ellipsoids show the a priori distributions computed by the planner 
along the path, (b) A simulated example trajectory (shown in ref) using LQG control. 

assume that the sensor can only measure the position p of the needle tip and not the 
orientation. Fig. 3a shows the samples obtained from 100 simulations of executions 
of the path using the LQG controller, demonstrating that the a priori probability dis- 
tributions computed by the planner are close approximations of the true distribution 
of the states along the path and that the needle follows the candidate path closely. 
For an insertion length of 11.25 cm, the standard deviation of the distance to the 
target was found to be 0. 17 cm. 

To emulate uncertainties arising due to tissue heterogeneities, we applied spa- 
tially varying process noise sampled from Gaussian distributions with zero mean 
and variances up to three times the variance M. Following the same candidate path 
of length 11.25 cm, the standard deviation of the distance of the final needle-tip 
position to the target across 1000 simulation runs was 0.24 cm. 

6.2 Effect of Sensor Placement on Optimal Path 

We demonstrate the use of LQG to select plans that minimize the probability of 
failure and also show the effect of sensor placement on the optimal path. We per- 
formed these experiments using an anatomical model of the human pelvic region 
(see Fig.|4]). As an example sensor model, we consider a 2-D image of the anatomy 
(for example, an x-ray or 2-D ultrasound image). The image is projected along the 
z-axis using an imaging device, from which we can only measure the x- and y- 
coordinate of the position p = [-i:,>',z] of the needle tip. This gives the following 
observation function h (note that p is part of X): 



h{X,q) 



q, q^^(0,Q) (38) 



Fig.|4](left) shows the optimal needle path within the RRT-tree for this sensor. The 
optimal path predominantly lies in the x-y plane to minimize the uncertainty along 
the viewing direction (z-axis). It took 41 seconds to generate a set of 1000 candidate 
paths and selecting an optimal path from this set required 4.6 seconds. Similarly, 
if we instead place the sensor such that it obtains 2-D images projected along the 
y-axis, the optimal path (shown in Fig. |4] (right)) is predominantly in the x-z plane. 
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Fig. 4 Optimal needle paths in an anatomical environment modeling the human prostate and 
surrounding tissues. Two examples of a sensor placement, in which (left) only the x- and 
v-coordinate and (right) only the x- and z-coordinate of the needle-tip are measured by an 
imaging device by projecting the anatomy on the imaging plane (shown in the two views). 
The optimal path predominantly lies in the imaging plane to minimize uncertainty in the 
viewing direction. 

To quantify the effect the sensor location has on the probability of success of a 
path, we compare the results of two paths assuming sensing along the z axis: the 
optimal path and a path that lies predominantly in the x-z plane. For each path, 
we estimated the "ground truth" probability of success by performing 1000 simu- 
lated executions using the LQG controller with artificially generated process and 
measurement noise. The path that is optimal according to our method, which lies 
predominantly in the x-y plane, has a probability of success of 89%. In contrast, the 
path that primarily lies in the x-z plane had a probability of success of only 44%, 
which is to be expected since there is greater uncertainty in needle pose along the 
viewing direction (z-axis). 

6.3 Optimizing the Sensor Placement 

In many clinical procedures involving bevel-tip steerable needles, the physician can 
select where to place an intra-operative sensing device for real-time feedback. We 
consider the case of an x-ray imaging sensor attached to a C-arm, a commonly used 
setup in operating rooms that allows the physician to rotate the sensor in a circle 
about the patient as shown in Fig. |5] The placement of the imaging device can be 
parameterized by angle 6 relative to the horizontal axis of the patient. A 2-D image 
of the anatomy can be obtained by parallel projection along the viewing direction 
(along the radius of the arm). This gives the following observation function h: 



KX,q) = 



ycosO +zsin6 



q, q^^(0,e) 



(39) 



To optimize sensor placement, we iterate over all paths computed by the RRT-based 
planner and all sampled sensor placements (obtained by varying the angle of rotation 
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Fig. 5 Using an x-ray imager mounted on a rotating C-arm, it is possible to rotate the sensor 
about the horizontal axis along which the patient is positioned (left). The anatomy as viewed 
from the computed optimal sensor placement (right). The optimal path predominantly lies in 
the imaging plane to minimize uncertainty in the viewing direction. 

about the horizontal axis in regular increments), and select the pair for which the 
needle has minimal probability of intersecting obstacles. Fig. |5] shows the optimal 
sensor placement for a given set of candidate paths and a 2-D view of the anatomy 
as visible from the imaging device. For a set of 1000 candidate paths and 36 possible 
sensor placements (obtained by discretizing over the interval [0, n] in intervals of 5 
degrees), our implementation took 185 seconds to compute the optimal pair over the 
set of candidate paths and sensor placements. It should be noted that with modern 
multi-core processors, this computation can be trivially parallelized to bring down 
the computation time within clinically acceptable limits. 



7 Conclusion and Future Work 



In this paper, we presented a technique for planning and controlling flexible steer- 
able needles towards a target location in 3-D anatomy with obstacles such as bones 
and sensitive organs. Our approach minimizes the probability that the needle in- 
tersects obstacles by explicitly taking into account both needle motion uncertainty 
and the sensors used to obtain (noisy, partial) feedback on the needle state. We 
demonstrated how the sensor influences the optimal path and optimize over the set 
of candidate paths and feasible sensor placements to select the pair for which the 
estimated uncertainty is least likely to cause intersections with obstacles. 

In our current implementation, the LQR controller does not bound the control 
inputs within permissible limits during feedback. This can be a problem when the 
control for needle curvature exceeds the attainable curvature of the needle. We plan 
to address this issue in future work. We also plan to incorporate uncertainty intro- 
duced by tissue deformation using a physically accurate FEM simulator ^. 

We also plan to evaluate our new planning and control approach using a robotic 
device that steers a needle in phantom tissue as in prior 2-D experiments I22i . We 
will utilize the ability of this planner and controller to account for unexpected events 
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that might occur due to tissue inhomogeneity, tissue deformation, and estimation 
errors in motion model parameters, as well as to optimize sensor placement to min- 
imize the probability of intersecting an obstacle. 
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Cyber Detectives: 

Determining When Robots or People Misbehave 



Jingjin Yu and Steven M. LaValle 



Abstract. This paper introduces a problem of validating the claimed behavior of 
an autonomous agent (human or robot) in an indoor environment containing one 
or more agents, against the observation history from a sparse network of simple, 
stationary sensors deployed in the same environment. Following principles of dy- 
namic programming, we partition the decision problem into incremental search over 
a sequence of connectivity subgraphs induced by sensor recordings, which yields 
efficient algorithms for both single and multiple agent cases. In addition to imme- 
diate applicability towards security and forensics problems, the idea of behavior 
validation using external sensors complements design time model verification. 



1 Introduction 

One night, a crime was committed in an office building with complex interior struc- 
ture. The next morning, a few suspects were identified but none of them would come 
forward. Instead, all of them provided seemingly convincing stories that excused 
them from being present at the crime scene. Unknown to the suspects, however, the 
building's security system, composed of a set of sensors with different capabilities, 
had made a sequence of recordings of passing people. Knowing that the criminal 
among the suspects was lying, can we use the sensor recordings to help solve the 
crime? 

Similarly, in computer science, robotics, and control, a frequently encountered 
problem is verifying that an autonomous system, be it a program or a robot, is per- 
forming as designed. For example, a service robot may plan a path to clean office 
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rooms one by one. Due to internal (sensor/actuator/computing units malfunction- 
ing) or external factors (strong electromagnetic interference for example), the robot 
may mistake one room for another and fail to accomplish its task without knowing 
that it has failed. A robot or a system may also be compromised for malicious pur- 
poses, producing intentionally bogus records of its actual path to hide the fact. In 
such cases, it would be highly desirable if external monitoring could automatically 
determine that a robot has faltered. 

In this paper, we introduce realistic abstractions of above problems and show that 
such formulations are computationally tractable. Specifically, one or more agents 
(robots or people) are assumed to move in an indoor environment, of which regions 
are monitored by external sensors (beam detectors and occupancy sensors). We as- 
sume that the agents are not aware of these sensors. From a story told by an agent, 
which is a sequences of places in the environment it has visited, and combined 
recordings of these sensors, we provide polynomial time algorithms (with respect 
to the complexity of the environment, the length of the story, as well as the length 
of the observation history) for the inference problem of whether the given story is 
consistent with the sensor recordings. 

Our work takes inspirations from two active research topics in robotics and con- 
trol. If one assumes that the behavior of a set of moving bodies is largely unknown, 
above problem becomes inferring various properties of these moving bodies with a 
network of simple sensors. Binary proximity sensors have been employed to esti- 
mate positions and velocities of a moving body using particle filters [3!] and mov- 
ing averages iflTl . The performance limits of a binary proximity sensor network in 
tracking a single target are discussed and approached in ll25l , followed by an exten- 
sion to the tracking of multiple targets 1261 . The task of counting multiple targets 
is also studied under different assumptions EKTSl- In these works, the sensor net- 
work's aggregate sensing range must cover the targets of interest at all times, which 
is much more difficult to implement than guarding critical regions of an environ- 
ment. When only subsets of an environment are guarded, word problems in groups 
IIT2I fT4ll naturally arise. For the setup in which targets moving inside a 2D region 
are monitored with a set of detection beams, |i2Fl characterizes possible target lo- 
cations, target path reconstruction up to homotopy, and path winding numbers. In 
this domain, the surfacing of more interesting behaviors also induces an increase in 
complexity; few efficient algorithms exist. This prompts us to ponder: Can we do 
better if partial knowledge of a target's behavior is available? In viewing its resem- 
blance to the questions asked in Il3l l25l[28]| . our problem requires the design of a 
combinatorial filter, similar to those in Il20l|29l[30l . These combinatorial filters are 
minimalist counterparts to widely known Bayesian filters ll6ll9l [T3ll2T1 122112711311 . 

On the other hand, if sensors external to moving bodies are ignored, one is left 
with the task of systematically verifying that the moving bodies do not have un- 
expected behaviors. Complex moving bodies such as robots are often modeled as 
hybrid systems. Existing verification techniques either address subclasses of hybrid 
systems or approximate reachable sets of such systems ll2ll8l lT0ll . because the prob- 
lem of verifying a system with continuous state space and control input is generally 
undecidable lH). In practice, this difficulty translates into the necessity of external 
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measures to safeguard the unverified portion of a system. Alternatively, when high 
level task specifications can be coded as General Reactivity(l) formulas [23 J . the 
task of composing controllers into verifiably correct hybrid automata can be car- 
ried out automatically using linear temporal logic ifTTlfTSl . Even for such provably 
correct designs, malfunction can still occur due to sensor/actuator/computer errors. 
Keeping these systems in check again requires monitoring with external sensors. 

The main contributions of this paper are twofold. First, using a sparse network 
of simple sensors to validate the claimed behavior of an autonomous agent intro- 
duces a new methodology that complements traditional system verification tech- 
niques such as II2II8I IIOI . We believe this is a necessary approach given that most 
verification processes focus on high level abstractions of an autonomous system, 
which only models simplified, ideal behavior. Second, applying principles of dy- 
namic programming (5^, we show that polynomial time algorithms exist for the 
proposed decision problems, providing insights into the structure of these detective 
game like problems. Moreover, the practical algorithmic solution may readily find 
its way in real world applications, such as system design/monitoring/verification, 
security, and sensor-based forensics. 

The rest of the paper is organized as follows. Section |2] defines the two detec- 
tive games we study in this paper. For the case in which a single agent triggers all 
sensor recordings. Section [3] extracts a base graph structure that captures the con- 
nectivity of the environment, which is subsequently broken down into pieces for 
the incremental search introduced and analyzed in Section |4l Section |5] extends the 
graphs and search algorithm to account for additional agents that are present in the 
environmenij. Section|6]discusses many open questions and concludes the paper. 



2 Problem Formulation 

2.1 Workspace, Agents and Stories 

Let the workspace Vl^ C K^ be a bounded, path connected open set with a polygonal 
boundary, dW. Let one or more point agents move around in W, carrying out un- 
known tasks. Every agent has a map of W and may move arbitrarily fast along some 
continuous path T : [tQ,tf] -^ W. The initial time to and the final time tf are common 
to all agents. Assume that we are interested in a specific agent x with a story of its 
own, which may be truthful or fictional. Since an agent may not always have an 
accurate estimate of its state, a truthful story is not necessarily what has happened. 
For example, a human can usually recall a (partial, possibly inaccurate) sequence of 
events after she has performed a task in an environment. In this iteration we let the 
story have a very basic form: A sequence of places in W that agent x has visited in 
increasing chronological order. 



An interactive implementation of algorithms from Section |4] and |5] is available at 

http://msl.cs .uiuc . edu/~JYul8/pe/cd.html. A web browser with Java 1.5 
support is required to access the page. 
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Fig. 1 A simple workspace with an occupancy sensor and a beam detector. Tlie occupancy 
sensor guards the shaded area with three doorways a,b, and c. The beam detector guards the 
vertical line segment at the top. 

V={PUP2,---,Pn), PiCW, 

such that the unique elements of p are each a simply connected region with a polyg- 
onal boundary and pairwise disjoint. The set of all unique elements of p is denoted 
'^p. We assume that for every p ^^p,x has accounted for all its visits to p in p. As 
an example, agent x may simply report "I went from room A to room B, then came 
back to room A, and eventually arrived room C, at which point I stopped." 



2.2 Sensors and the Observation History 

Let a subset of the workspace W be guarded by a heterogeneous set of sensors. The 
placement of sensors in W is unknown to all agents. Among the commonly available 
sensors for surveillance, we focus on occupancy sensors and beam detectors. An 
occupancy sensor is assumed to detect the presence of an agent in a fixed, convex 
subset 5 C W. For example, a room may be monitored by such a sensor (the shaded 
area in Fig. [Hi. A data point recorded by an occupancy sensor o,- has two parts, an 
activation. 



and a deactivation. 



rod = {oi,td), 



in which ?„ is the time when the first agent enters an empty s and tj is the time 
when the last agent exits s. A beam sensor, on the other hand, guards a straight line 
segment, £ cW, between two edges of dW (for example, the red line segment in 
Fig. [T]l. A data point of such a sensor, bt, is recorded as an agent crosses £, which 
can be represented by a 2-tuple: 

rh = {bi,t). 

A beam detector is deactivated right after activation. We further assume that when 
a beam detector is triggered by an agent, the agent must pass from one side of 
the beam to the other side. We denote the collection of all unique sensors in W as 
'^s- With the introduction of occupancy sensors and beam detectors, we define the 
observation history simply as: 

r= (ri,r2,...,r„,), 



Cyber Detectives: Determining When Robots or People Misbeliave 395 

in which each r, = roa,rod, or r/,, is indexed by the time when it occurs, 
incrementally. 

Both occupancy sensors and beam detectors are weak sensors in the sense that 
they cannot tell an agent's passing direction. In the example given in Fig.[T] a sensor 
recording of the occupancy sensor could imply that the agent enters and exits from 
any of the doorways a,b, or c. Similarly, when the beam detector is triggered, an 
agent could be passing it from left to right or in the other direction. These sensors 
certainly cannot distinguish among different agents. We choose to work with these 
two typical but weak sensors so that the algorithms we present apply to a wider 
range of sensors, provided that they are at least as powerful (although the algorithms 
may not take full advantage of these stronger sensors). For example, a video camera 
is a stronger occupancy sensor, capable of providing both passing direction and 
identification of agents. 

Observation History for a Single Agent. Without loss of generality, we assume 
that the sensors' detection regions (field of view) are pairwise disjoint: When two 
or more sensors have overlapping detection regions, we may create virtual sensors 
by repartitioning these sensors' detection regions so that the virtual sensors have 
disjoint detection regions 12011 . This implies that when agent x is the only agent in 
W, the activation of any sensor must be followed by the deactivation of the same 
sensor, with no other sensor activities in between. In particular, in the observation 
history for a single agent, there can be no other sensor activations or deactivations 
between one activation/deactivation of an occupancy sensor. 

Observation History for Multiple Agents. When there are multiple (an unknown 
number) agents in the workspace, it is no longer reasonable to assume that all sen- 
sor recordings have activation-deactivation intervals that are pairwise disjoint. For 
example, one agent may pass a beam detector while another one occupies a room 
monitored by an occupancy sensor. Different occupancy sensors can also have over- 
lapping intervals of activation. When multiple agents are present in W, we assume 
that all sensor activation and deactivation times are distinct, since the likelihood of 
simultaneous sensor triggering is very low. 



2.3 The Verification Problem 

Given W, '^p,'^s, agent x's story p and the observation history r, we are interested 
in determining whether p is consistent with r. For the comparison to make sense, 
we require that both p and r span the same time interval, [tQ,tf]. That is, we must 
determine whether there exists an agent path containing the locations given in p in 
the specified order that triggers the sensor recordings given by r. For the purpose of 
introducing algorithms, we use the example workspace given in Fig.|2]and let agent 
x's story be: 

p = (A,C,B,A,C). (1) 
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Fig. 2 A workspace with two beam detectors bi,b2, two occupancy sensors 01,02, and three 
labeled rooms A,B and C. Thus, ''^p = {A,B,C}, "^s = {^1,^2 101,02}. There are four con- 
nected components Ri through ^4 when regions guarded by sensor range and rooms in agent 
x's story are treated as workspace obstacles. 



In English, agent x told the story that it started in room A and went through room 
C,B,A, and C, in that order. When there is a single agent in the workspace, we let 
the observation history be: 



((/?!, /i), (01, f2), (01,^3), (^2, ?4), (02,^5), (02, fe)). 



(2) 



For this simple example, it is not hard to see that p is not consistent with r: B can 
only be visited after agent x passes b2 from left to right; however, after visiting B, 
either b2 or 02,01 must be triggered for x to visit A once more. When there are 
multiple agents in the workspace, we let the observation history be slightly different 
(with which p is consistent): 



r =((^i,fi), (01,^2), (02,^3), (^2, '4), (02,^5), (01, ?6))- 



(3) 



In the example, we have implicitly made the assumption that elements of "^p and 
elements of '^^ have coverage regions that are pairwise disjoint; overlapping cases 
will be handled after the main algorithms are introduced. 



3 The Connectivity Grapli and Sensing Induced Subgraphs 

Both occupancy sensors and beam detectors, when not triggered, act as obstacles 
that change the workspace connectivity. When a sensor is triggered, the part of the 
workspace blocked by that sensor is temporarily connected. To explore the structure 
from this intuition, we first build a connectivity graph G that captures the topolog- 
ical features of W. As we are only interested in finding a patqj through p that is 
compliant with r, we only need G to capture how elements of "^p are connected and 
how they are connected to the sensors, ^^5. Therefore, we treat these elements as 



^ In graph theory, a path does not visit one vertex multiple times. Therefore, the image of a 
continuous path, when discretized, becomes a walk in graph theory terminologies, since it 
may visit a vertex multiple times. In this paper, we abuse the term path slightly to denote 
both a continuous function T : [to,tf] -^ W and the corresponding walk in a discrete graph. 
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vertices of G. Since there are two possible directions that an agent may pass a beam 
detector, two vertices are needed for each beam detector. A single vertex is needed 
for each element of '^p and for each location guarded by an occupancy sensor. For 
the example from Fig. 121 the collection of vertices is 

V = {A,B,C,oi,02,bu„bid,b2hb2r}, 

in which bi„,bid are the upper and lower sides of bi, respectively (these two sides 
are naturally obtained if a beam is represented as two oppositely oriented edges, as 
commonly used in computations involving polygons). Similarly, b2i,b2i- are the left 
and right sides of £>2. 




(a) 



(b) 



Fig. 3 a) The connectivity graph of the example given in Fig.|2] b) An alternative connectivity 
graph including connected components of Wfree as vertices. 



To connect the vertices, we need to obtain the connectivity of the workspace al- 
gorithmically, treating the regions occupied by elements of '^p and '^^ as obstacles. 
Denote the workspace excluding these obstacles as Wfree- Determining the connec- 
tivity of Wfree IS equivalent to finding the connected components of Wfree- We call 
the subroutine that does this BuildConnectivityGraph but omit the code since 
it is a fairly standard procedure^ Applying this procedure to our example yields the 
connectivity graph G = {V,E) given in Fig. |3la)- We point out that there are other 
choices in constructing the connectivity graph. For example, following an (more 
natural) equivalence class approach, we may alternatively build the graph based on 
how regions Ri through R4 are connected (Fig.[3jb)). We may further treat sensors 
and rooms as directed edges. There are no fundamental differences between these 
choices for our purpose: Although the later two provide simpler graphs, slightly 
more sophisticated graph search routines would then be needed. 

With G constructed, we can now explore the extra information provided by the 
observation history: The relative timing of sensor recordings. This information 
essentially partitions G into different pieces at different time instances. In this 
section we focus on the case of workspace with a single agent. In the observation 
history given in (|2]l, bi is the first sensor that is set off. This means that at the time 



^ One efficient way of doing this is to apply a cell decomposition procedure (see 1191 , 
Chapter 6), such as vertical cell decomposition Q, to Wfree and then combine the cells 
that share borders. For more details, please refer to the extended version of the paper at 

http://msl.cs.uiuc. edu/~JYul8/wafrlO/full .pdf 
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right before ti when the sensor is activated, the agent must be at either £>!„ or bid- 
During the time interval [/o,/i), since i>2,oi, and 02 are inactive, they act as obsta- 
cles. The part of G that the agent may travel during [/o,/i) is then given by Gi in 
Fig.m in which A is the start vertex and biu,bici are the possible goal vertices. Ver- 
tex B does not appear in Gi because it is not reachable. Similarly, we obtain the 
subgraphs of G during time intervals {ti, 12), (13,(4), {t4,ts),{t(,,tf] as G2 through G5 
in Fig. m respectively. Graph G2 has two parts since there are two possible start 
vertices. Note that when the start and goal vertices in these subgraph correspond to 
sensor vertices, they can be visited only once as the start vertex or the goal vertex. 
The pseudocode is given in GetSubGraph (Algorithm[Tl). The algorithm calls the 
subroutine GetReachableSubgraph (G, .J, Vc,Vg) (Algorithm|2li, which returns 
the part of G reachable from s, passing only vertices in Vc- If Vq is not empty, then 
a path from s must also end at vertices of Vq- We separate this subroutine since it 
will be reused. In Algorithm[2l subroutine ConnectedComponent(G,5) returns 
the connected component of G containing s. We note that, although it is possible to 
work with G directly instead of working with these subgraphs, they will be help- 
ful in understanding the algorithm and in complexity analysis. Moreover, it can be 
a good heuristic to build these subgraphs to restrict search in problems with large 
workspaces. 



Algorithm 1. GetSubGraph 

Input: G = {V,E), the start vertex s, ^£p, and goal vertices Vc 
Output: G' = [V' ,E'), the part of G that is reachable from i 



1: Kc^-^UW 

2: return GetReachableSubgraph(G,.s, Vc,Vc) 



Algorithm 2. GetReachableSubgraph 



Input: G={V,E),s,Vc,Vc 
Output: G' = {V',E') 

1: for all edges (v,-, vy) e E such that v,-, vy e Vc do 
2: add (v;,vy) to£' // V" is also updated. 

3: end for 

4: G' ^ CONNECTEDCOMPONENT(G',i) 

5: if Vg is not empty then 

6: for all vi,vj such that v,- e V', vj e Vc do 

7: if (v;,Vj) e E then 

8: add (v/,Vj) to£' 

9: end if 

10: end for 
11: V'^V'UVc 

12: end if 

13: return G' 
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Fig. 4 Tlie subgraphs of G induced by the sensor observation history. The green vertices are 
possible start positions and the red vertices are possible goal positions. 



The correctness of Algorithm [T| through |2] is by construction, which is straight- 
forward to verify. We now give an estimate of the worst case performance of these 
algorithms. Let Wfree iiave an input size of «„., BuildConnectivityGraph has 
time complexity 0{nl,). In subroutine GetReachableSubgraph, the subroutine 
for obtaining connected components takes time linear in n„ fXM- The complexity is 
then decided by the for loop at line 6 and the membership check at line 7, which 
takes no more than 0(|VG|«n,lg«H') in total. 



4 Validating a Single Agent's Story against an Observation 
History of a Single Agent 

When there is a single agent in the workspace, every sensor recording is triggered 
by that agent. In this case, supposing that we have the subgraphs of G, the rest of 
the work becomes searching through these graphs, one by one, for a path that agrees 
with the agent's story. A straightforward approach is to connect one subgraph's 
goal vertices to the next subgraph's start vertices and perform an exhaustive search 
through paths to see whether there are matches. Such naive algorithms are not scal- 
able, however, since every beam detector can require connecting the subgraphs in 
two ways (for example, G21 , G22 in Fig- Hi. The number of search paths through the 
subgraphs is then exponential in the number of sensor recordings on average. In the 
worst case, breadth-first or depth-first search through all these graphs may take an 
exponential amount of time. 

To organize the search more efficiently, we first connect the subgraphs to get 
a better understanding of the topology of the graph to be searched. To make the 
structure more explicit for search, we also make the subgraphs directed. Doing this 
to all sensing induced subgraphs of G yields the graph illustrated in Fig. [S] Denote 
this graph G,. The problem of validating p against r becomes searching through Gj 
for a path p' such that, after deleting the vertices corresponding to sensors from "^s, 
p' is exactly p. We observe that, since Gs contains at most 2{m+l) copies of G, any 
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Fig. 5 Part of the composite graph Gs built from the connectivity graph G and sensor 
observation history r. 



element of '^p Wifs cannot appear more than 0{m) times in Gs. This observation 
indicates it may be possible to apply the principles of dynamic programming to 
partition of the search problem into subproblems: Each subproblem is validating a 
tail {pi,...,p„) ofp, starting from a subset of vertices of Gi corresponding to /7,_i. 
The total number of subproblems per p,- is 0{m); if going from one subproblem to 
a smaller subproblem takes polynomial amount of time, then the total time spent on 
searching Gs is also polynomial. 

This turns out to be the case for our problem. Before formally introducing the 
algorithm, we illustrate how it operates with the provided example. We write the 
agent's story compactly as p = ACBAC. Since agent x starts in A, we are done with 
pi =A, leaving CBAC to validate. For p2 = C, it is possible to reach from A in Gi to 
copies of C in Gi, G21 (passing bi„, bid) , and G3 (passing Z7i^/,Z7i„, 01). The copy of 
C in G22 is not directly reachable from A in Gi , passing only vertices from sensors. 
We may write the three subproblems as (For P E^p, Pq. denotes that the copy of P 
is from the subgraph G;. For example, Aq^ denotes the copy of A from the subgraph 
Gi): 



^Gibiub\dCG2i 

AGibuibiuOiCc^ 



BAG, 
BAG, 
BAG. 



Since there are multiple subproblems for p^, = B, going through these subproblems 
individually may introduce a factor of 0{m) per problem; there can be 0{m) sub- 
problems, which will contribute a factor of 0{m^) to the overall running time. To 
avoid this, we again use the sequential nature of Gs. Instead of processing each sub- 
problem individually, we process all of them together, staged at each Gj. For our 
example, the first subproblem starts with the copy of C in Gi : It is possible to go 
through b III, bi„ and get to oi . We now pick up the second subproblem and see that it 
is possible to go from C in G21 to oi as well. At this point, the first two subproblems 
collapse into a single subproblem. Going into G3, we pick up the third subproblem. 
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For the copy of Cin G3 , since it must pass A to reach B, this subproblem dies; we are 
left with a single subproblem to reach B from i>2i in G3. Following above procedure, 
we obtain two subproblems after processing pi = B: 



AGibli,bijCG2iOlb2lb2rBG4 
Ag, biub\dCG2iO\b2\b2r02BGi 



AC, 
AC. 



Note that we do not keep all valid paths in this search; doing so will require space 
exponential in m. After all of p is processed, if some subproblems survive, then p is 
consistent with r; any surviving subproblem also provides a feasible path. 

Algorithm 3. ValidateAgentStory 

Input: G,p= {pi,...,pn),r= (n,...,^™) 

Output: true if p is consistent with r, false otherwise 



1: 

2: 

3: 

4: 

5: 

6: 

7: 

8: 

9: 

10: 

11: 

12: 

13: 

14: 

15, 

16 

17: 

18 

19 

20: 

21, 

22: 

23: 

24: 

25: 

26: 

27, 

28 

29, 



V/^{pi} 

for 7 = 1 to m + 1 do 

initialize Vc as an empty set 
if IsDEACTIVATION(ry) then 

continue 
end if 
if (j 7^ m+ 1) then 

Vc ^ SENSORVERTICES(r,) 

else 

empty Vg 
end if 

Gj ^ GetSubGraph(G, V/,-^,,, Vg) 

end for 

G, ^Chain(Gi,...,G„,+i) 

initialize V,, K/ as empty sets of two tuples 

Vj <— {(pi , 1)} // A two tuple is a vertex of Gs 

for I = 2 to n do 

for y = 1 to m + 1 do 

if (p; , j) adjacent to (p,_ i,k) <E Vs for some k< j then 
if i == n&&j == m + 1 then 

return true 
end if 

add(p,-,j)toV; 
end if 
end for 

K, ^ Vj'; empty V,' 
end for 
return false 



The pseudocode is summarized in Algorithm[3l Subroutine IsDEACTlVATlON(r) 
returns true only if r is the deactivation of an occupancy sensor. The subroutine 
SENSORVERTlCES(r) returns the vertices of G induced by the sensor in a sensor 
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recording r. The subroutine Chain (. . .) connects all input graphs sequentially based 
on sensor crossings, which is trivial to implement. In the code, we use (pij) to de- 
note the the copy of /?,- in subgraph Gj. The correctness of ValidateAgentStory 
follows from its construction based on dynamic programming, which we briefly 
corroborate. After each p, is worked on, there are up to 0{m) subproblems since 
there are no more than 0{m) copies of pi in Gs- Because the further observation 
that Gs is sequential, the suproblems for each /?,■ can be processed in a single pass 
of Gs- We make 0{m) calls to GetSubGraph, which takes 0{mn„\gn„) total 
time (Since \Vg\ < 2 in calls to GetSubGraph). Going through the for loops, it is 
straightforward to get that the rest of the algorithm has complexity no worse than 
0{n-m\gn„) = (9(«mlgn„,). The worst case running time is then upper bounded by 
0{m{n + n„)\gn„). 

As mentioned in the problem formulation, we have assumed that "^^p and "^s do not 
overlap in M? . These are not included in the above algorithm to avoid complicating 
the presentation. What if some p £^p and s £ 'iSs do overlap? There are several 
subcases. If the regions of p^s coincide (for example, there may be an occupancy 
sensor in room A), this essentially breaks the problem into several smaller problems, 
to which the above algorithm applies. If i' C p, agent x then must go through p to 
reach .s, in which case we can build G to make s a vertex connecting to p only. 
The same applies if /? C s. In the last case s^ p partially overlap but do not include 
each other; we can treat s^p as three regions: s\p as an sensor, p\s as a room, and 
5 n /7 as a fully overlapping sensor and room (this case only happens to occupancy 
sensors, not beam detectors). This will split the verification problem into several 
subproblems, which may induce exponential growth in running time. However, the 
last case is not likely to often happen since occupancy sensors are usually placed to 
guard an entire room. We can also minimize such a problem by carefully placing 
the sensors. 



5 Validating a Single Agent's Story against an Observation 
History of Multiple Agents 

When there are multiple agents (an unknown number) in the workspace, two com- 
plications arise. First, as mentioned in Section [31 when multiple agents are in the 
workspace, there can be many agents in the region monitored by an occupancy sen- 
sor during one of its activation-deactivation time interval. Effectively, this allows 
any agent to temporarily go through the region monitored by an activated occu- 
pancy sensor. This suggests that the recordings from occupancy sensors should only 
be treated as events that change the connectivity of the environment. That is, whether 
agent x is the agent that triggered the activation/deactivation of an occupancy sen- 
sor is not relevant. Second, agent x may not be responsible for all beam detector 
recordings. Instead, it may trigger any subsequence of sensor recordings. For an 
observation history sequence of length m, there are up to 0(2'") possible subse- 
quences; agent x may have triggered any one of these subsequences, but not the rest 
of r. 
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To overcome these difficulties, we start by examining how the composite graph 
Gs changes. As analyzed above, only beam detector events need to be considered for 
agent x. For occupancy sensors, we maintain an active list as r is processed; addi- 
tional processing is needed only when deactivation of an occupancy sensor happens. 
To illustrate the procedure for building the composite graph, we use p from ([T]l and 
r from ([3]). Starting with the first beam detector recording, {bi,ti), if agent x is re- 
sponsible for it, then the reachable part before bi is crossed is the same as Gi from 
Fig- HI To emphasize that this graph is built from to to ti, we denote it as Gj. The 
next two recordings in r are activations of oi and 02. Since there are only activa- 
tions, which only cause more locations of the environment to become reachable, we 
store these in the active occupancy sensor list and continue. 

For the next recording, {b2,t4), three subgraphs need to be built, one start from A, 
one start from bi„, and one start from bi^. Following the naming convention of Gj, 
these should have names G4, G4', and G^^, respectively. To build G4, we need to 
keep vertices {A,b2i,b2r,0i,O2}. We also add {B,C} since these are vertices of^^p 
that are reachable from {A,b2i,b2r,oi,02} without crossing additional sensors. This 
gives us the subgraph in Fig. |6ja). To facilitate searching, for each pair of neigh- 
bors of an active occupancy sensor, we add an edge between them and remove the 
occupancy sensor vertex, which yields the graph in Fig. Ob). The pesudocode for 
building this subgraph, GetSubGraphMulti, is given in Algorithm ID Assum- 
ing we have obtained G4' and G4'^ similarly, the next sensor recordings is (02, fj). 



B A 



02 C 
(a) (b) 

Fig. 6 a) One possible connectivity subgraph, G4, during (?4,?5) when mulitple agents are in 
the workspace, b) Updated connectivity subgraph reflecting whether two vertices are reach- 
able without triggering additional sensor recordings. 



Algorithm 4. GetSubGraphMulti 

Input: G = {y,E), the start vertex s, ^'p, the active occupancy sensors O, and goal vertices 

Vg- 
Output: G' = (V',£'), the part of G that is reachable from j. 





G' ^ GetReachableSubgraph(G,.s,Vc, Vc) 
foralloe(OnV')do 

add to E' an edge between each pair of o's neighbors 

remove o from G' 
end for 
return G' 
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which corresponds to the deactivation of 02- For this event, we need to create five 
subgraphs starting from A,biu,bid,b2i,b2r with names G^,Gl^^ ^G^^^G^^C^-^, re- 
spectively. Since during [15,1 f], no new locations of G become reachable and no 
other beam sensor recordings happen, this part of r can be ignored. After connect- 
ing all these subgraphs based on sensor crossings, we obtain the composite graph 
Gj as illustrated in Fig.|7] 




Gl' 


K\ 


-• W. 


^21 • 
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'• ^id 


^21 •, , 




Gl 


62,. •' 


■•A 


^2} •' 
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•^» ^21 



Gf 



> K 



Fig. 7 Sketch of the composite graph Gj. 

Before continuing with searching Gs, we make one observation from above graph 
building procedure: Since each sensor recording may cause up to 0{m) new sub- 
graphs to be built, up to 0{m^) subgraphs may be built altogether. This is the num- 
ber of subgraphs in Gs since each subgraph appears once in Gs- To search through 
Gs for a path matching p, the same strategy from ValidateAgentStory can be 
applied. That is, a dynamic programming approach can be used in which a subprob- 
lem is a tail of p and a location in Gs- Since there are no more than Oirrt^) copies of 
Pi in Gs, there can be at most 0{m^) subproblems after each p, is processed. Sim- 
ilar to ValidateAgentStory, during the processing of each /?, each subgraph 
only needs to be considered once. This limit the time complexity of searching Gs 
at 0{nm^logm„). To get the total time complexity, we need the time for building 
Gs, which is irp- times the cost of the subroutine GetSubGraphMulti. The run- 
ning time of GetSubGraphMulti is determined by the loop at lines 3 through 6, 
which takes 0{m^^,) time. This yields the overall time complexity 0{np-{n\ogmn. + 
m^^,)). Since the algorithm operates much like ValidateAgentStory, we omit 
the pesudocode. 



6 Conclusion and Open Questions 



We introduced a decision problem in which a story of an autonomous agent is vali- 
dated against the observation history gathered by simple sensors placed in the same 
environment. We showed that sensor recordings act as temporary obstacles that con- 
strain agents' movements in the environment, effectively creating a sequence of 
connectivity subgraphs of the environment. Based on this observation, we designed 
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polynomial time algorithms that decide whether the agent's story is possible given 
the observation history and retrieves a possible path if one exists. 

As a first attempt at a new problem, more questions are opened than answered. 
Some immediate ones are: What if the agent can only recall a partial story? In the 
multiple agent case, what if every agent's story is valid but the combined story 
is inconsistent? In letting the agents move arbitrarily fast, we only addressed one 
discrete aspect of a general problem in this paper. A natural next step is to work on 
a kinematic agent model with bounded control inputs, which seems to require more 
careful analysis of the continuous state space and much richer behaviors. Since an 
agent's speed is limited, some plausible paths the agent could have taken will now 
be ruled out. Continuous state space also makes it possible to study optimality: a 
network of sensors can potentially detect whether an agent is performing its task 
most efficiently, in terms of time or distance traveled. This becomes more relevant 
as system designs become more and more complex; sometimes it is challenging to 
just get a feasible plan ifTTIfTSJI . Another interesting direction is to study optimal 
sensor placements for the detective task, which was discussed to some extent in 
12411281 . Furthermore, the story in this paper only contains "when" and "where". It 
is an intriguing open problem to check stories involving "what", "how", and "why"? 
Logic seems essential in investigating these elements of a story. 

Probabilistic formulations of the story validation problem may also be fruitful 
to explore. The agents in this paper are assumed to be nondeterministic in that the 
algorithms treat all possible paths equally. In a real environment, however, a typi- 
cal agent usually does have preferences when multiple choices are present. Models 
considering the transition probability of an agent from room to room could then un- 
cover the most likely path(s) taken by the agent. If none of the feasible paths taken 
by the agent are probable, the agent could still be considered as misbehaving. Prob- 
ability can also be introduced into sensor observation history. Sensors, no matter 
how reliable they are, may have false positives and false negatives. For example, a 
beam sensor may misfire (triggered by a mouse for instance) with small probability. 
This implies that sensors, unlike walls, may be better treated as "soft" obstacles. 

Acknowledgements. The authors thank Amir Nayyeri, Max Katsev, and the anonymous 
reviewers for helpful comments. This work was supported in part by NSF grants 0904501 
(IIS Robotics) and 1035345 (Cyberphysical Systems), DARPA SToMP grant HROOll-05-1- 
0008, and MURI/ONR grant N00014-09-1-1052. 



References 

1. Alur, R., Henzinger, T.A., Lafferriere, G., Pappas, G.J.: Discrete abstractions of hybrid 
systems. Proceedings of the IEEE, 971-984 (2000) 

2. Asarin, E., Bournez, O., Dang, T, Maler, O.: Approximate reachability analysis of 
piecewise-linear dynamical systems, pp. 21-31. Springer, Heidelberg (2000) 

3. Aslam, J., Butler, Z., Constantin, R, Crespi, V., Cybenko, G., Rus, D.: Tracking a moving 
object with a binary sensor network, pp. 150-161. ACM Press, New York (2002) 



406 J. Yu and S.M. LaValle 

4. Baryshnikov, Y., Ghrist, R.: Target enumeration via integration over planar sensor net- 
works. In: Proceedings of Robotics: Science and Systems IV, Zuricli, Switzerland (June 
2008) 

5. Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957) 

6. Castellanos, J., Montiel, J., Neira, J., Tardos, J.: The SPmap: A probabilistic framework 
for simultaneous localization and mapping. IEEE Transactions on Robotics & Automa- 
tion 15(5), 948-953 (1999) 

7. Chazelle, B.: Approximation and decomposition of shapes. In: Schwartz, J.T., Yap, C.K. 
(eds.) Algorithmic and Geometric Aspects of Robotics, pp. 145-185. Lawrence Erlbaum 
Associates, Hillsdale (1987) 

8. Cheng, P., Kumar, V.: Sampling-based falsification and verification of controllers for 
continuous dynamic systems. In: Proceedings Workshop on Algorithmic Foundations of 
Robotics (2006) 

9. Choset, H., Nagatani, K.: Topological simultaneous localization and mapping (T- 
SLAM). IEEE Transactions on Robotics & Automation 17(2), 125-137 (2001) 

10. Chutinan, A., Krogh, B.: Computational techniques for hybrid system verification. IEEE 
Transactions on Automatic Control 48(1), 64-75 (2003) 

11. Conner, D.C., Kress-gazit, H., Choset, H., Rizzi, A.A., Pappas, G.: Valet parking without 
a valet. In: lEEE/RSJ International Conference on Intelligent Robots & Systems (2007) 

12. Dehn, M.: Papers on Group Theory and Topology. Springer, Berlin (1987) 

13. Dissanayake, G., Newman, P., Clark, S., Durrant-Whyte, H.F., Csorba, M.: A solution to 
the simultaneous localisation and map building (SLAM) problem. IEEE Transactions on 
Robotics & Automation 17(3), 229-241 (2001) 

14. Epstein, D.B.A., Paterson, M.S., Camon, G.W., Hofi, D.F., Levy, S.V., Thurston, W.P: 
Word Processing in Groups. A. K. Peters, Natick (1992) 

15. Fang, Q., Zhao, F, Guibas, L.: Counting targets: Building and managing aggregates in 
wireless sensor networks. Technical Report P2002-10298, Palo Alto Research Center 
(June 2002) 

16. Hopcroft, J., Tarjan, R.: Efficient algorithms for graph manipulation. Communications 
of the ACM 16(6), 372-378 (1973) 

17. Kim, W., Mechitov, K., Choi, J., Ham, S.: On target tracking with binary proximity 
sensors. In: ACM/IEEE International Conference on Information Processing in Sensor 
Networks, pp. 301-308. IEEE Press, Los Alamitos (2005) 

18. Kress-gazit, H., Fainekos, G.E., Pappas, G.: Where's waldo? sensor-based temporal logic 
motion planning. In: Proceedings IEEE International Conference on Robotics & Au- 
tomation, pp. 3116-3121 (2007) 

19. LaValle, S.M.: Planning Algorithms. Cambridge University Press, Cambridge (2006), 
http : / /planning . cs . uiuc . edu/ 

20. LaValle, S.M.: Filtering and planning in information spaces. Technical report. Depart- 
ment of Computer Science, University of Illinois (October 2009) 

21. Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B.: FastSLAM: A factored solution 
to the simultaneous localization and mapping problem. In: Proceedings AAAI National 
Conference on Artificial Intelligence (1999) 

22. Parr, R., Eliazar, A.: DP-SLAM: Fast, robust simultaneous localization and mapping 
without predetermined landmarks. In: Proceedings International Joint Conference on Ar- 
tificial Intelligence (2003) 

23. Piterman, N., Pnueli, A., Sa'ar, Y: Synthesis of reactive(l) designs. In: Proceedings Ver- 
ification, Model Checking, and Abstract Interpretation, pp. 364-380. Springer, Heidel- 
berg (2006) 



Cyber Detectives: Determining When Robots or People Misbeliave 407 

24. Rao, B.S.Y., Durrant-Whyte, H.F., Sheen, J.S.: A fully decentralized multi-sensor system 
for tracking and surveillance. International Journal of Robotics Research 12(1), 20-44 
(1993) 

25. Shrivastava, N., Mudumbai, R., Madhow, U., Suri, S.: Target tracking with binary prox- 
imity sensors: fundamental limits, minimal descriptions, and algorithms. In: Proc. 4th 
Internat. Conf. on Embedded Networked Sensor Systems, pp. 251-264. ACM Press, 
New York (2006) 

26. Singh, J., Kumar, R., Madhow, U., Suri, S., Cagley, R.: Tracking multiple targets using 
binary proximity sensors. In: Proc. Information Processing in Sensor Networks (2007) 

27. Thrun, S., Burgard, W., Fox, D.: A probabilistic approach to concurrent mapping and 
localization for mobile robots. Machine Learning 31(5), 1-25 (1998) 

28. Tovar, B., Cohen, P., LaValle, S.M.: Sensor beams, obstacles, and possible paths. In: 
Proceedings Workshop on Algorithmic Foundations of Robotics (2008) 

29. Tovar, B., Murrieta, R., LaValle, S.M.: Distance-optimal navigation in an unknown en- 
vironment without sensing distances. IEEE Transactions on Robotics 23(3), 506-518 
(2007) 

30. Yu, J., LaValle, S.M.: Tracking hidden agents through shadow information spaces. In: 
Proceedings IEEE International Conference on Robotics & Automation (2008) 

31. Yu, J., LaValle, S.M.: Probabilistic shadow information spaces. In: Proceedings IEEE 
International Conference on Robotics & Automation (2010) 



Gravity-Based Robotic Cloth Folding 

Jur van den Berg, Stephen Miller, Ken Goldberg, and Pieter Abbeel 



Abstract. We consider how the tedious chore of folding clothes can be performed 
by a robot. At the core of our approach is the definition of a cloth model that allows 
us to reason about the geometry rather than the physics of the cloth in significant 
parts of the state space. We present an algorithm that, given the geometry of the 
cloth, computes how many grippers are needed and what the motion of these grip- 
pers are to achieve a final configuration specified as a sequence of g-folds — folds 
that can be achieved while staying in the subset of the state space to which the ge- 
ometric model applies. G-folds are easy to specify and are sufficiently rich to cap- 
ture most common cloth folding procedures. We consider folds involving single and 
stacked layers of material and describe experiments folding towels, shirts, sweaters, 
and slacks with a Willow Garage PR2 robot. Experiments based on the planner had 
success rates varying between 5/9 and 9/9 for different clothing articles. 



1 Introduction 

An English patent for a clothes washing machine was issued in 1691. Since then, 
there have been many innovations in washing and drying, but folding of clothes 
remains a manual (and notoriously tedious) activity. In this paper, we present a 
geometric model and algorithms that are steps toward autonomous robot folding 
of clothes. Cloth is highly non-rigid, flexible, and deformable with an infinite- 
dimensional configuration space. We consider articles of clothing that can be 
described by a simple polygonal boundary when lying flat on a horizontal surface. 

We introduce a deterministic geometric model of cloth motion based on gravity 
and assumptions about material properties. We constrain robot motion such that at 
all times, one part of the cloth is lying horizontally on the table and one part (pos- 
sibly empty) hangs vertically from the grippers parallel to the gravity vector. This 
allows a configuration of the cloth to be fully determined by the line that separates 
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Fig. 1 Folding a long-sleeve into a square using a sequence of seven g-folds. Red g-folds 
apply to the geometry that was folded in the preceding g-fold. Blue g-folds apply to the entire 
geometry. 

the horizontal and the vertical parts. We call this line the baseline, which defines a 
2-D configuration space for the material. 

Given polygonal geometry of the cloth, number of grippers, and desired fold se- 
quence (see Fig. [l]), we present an algorithm that computes a motion plan for the 
grippers that moves the cloth through the C-space to reach the desired final arrange- 
ment, or a report that no such motion plan exists. We implemented the algorithm on a 
Willow Garage PR-2 robot and report experiments folding towels, t-shirts, sweaters, 
and slacks. 

The remainder of this paper is organized as follows. In Section |2] we discuss 
related work. In Section [3] we define the problem. In Section]?] we describe our al- 
gorithm to compute the manipulation motion of the robot to execute a given folding 
sequence using g-folds. In Section]5]we report experimental results folding towels, 
t-shirts, and slacks using a Willow Garage PR-2 robot. We conclude in Section]6] 



2 Related Work 

The work that is most related to ours is the work of Bell and Balkcom jJSj]!!. In 
m, the grasp points are computed to immobilize a polygonal non-stretchable piece 
of cloth. Gravity is used to reduce the number of grasp points to hold cloth in a 
predictable configuration, potentially with a single fold, using two grippers in Q. 
We extend this work and include a folding surface. We assume that points that are 
lying on a table are fixed by friction and gravity, and need not be grasped. The work 
of IS also shows how to fold a t-shirt using the Japanese method; the fold can be 
achieved by grasping the cloth at three points without regrasping. 
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In Q, the robot handling of cloth material is discussed and some specific folds 
are shown. The work of 1141 also discusses a specific folding manipulation. The 
work of lfT2l deals specifically with folding towels. The work focuses on visual 
detection of the vertices of the towel, and use a scripted motion to achieve folds 
using a PR-2 robot. We build on the results of this work in our experiments. 

There is also quite a large body of work on cloth simulation, which simulates the 
behavior of cloth under manipulation forces using the laws of physics |[2l[5][6l . In our 
work, we manipulate cloth such that it is always in a configuration that allows us to 
reason about the geometry of the cloth, rather than about its physics and dynamics. 

Folding has been extensively studied in the context of origami ||T|[9| [T01 . Origami, 
or paper folding, is fundamentally different from cloth folding, since unfolded re- 
gions of the paper are considered to be rigid facets connected by "hinges" that model 
the creases in the paper. In contrast, cloth material is flexible everywhere. Yet, we 
draw from results in paper folding in our work. Applications of paper folding out- 
side origami include box folding ifTSlfTTl and metal bending [SJ, where the material 
model is essentially the same as that of paper. 



3 Problem Description 

We assume gravity acting in the downward vertical (— z) direction and a sufficiently 
large planar table in the horizontal (xy) plane. We assume the article of clothing can 
be fully described by a simple polygon (convex or non-convex) initially lying on 
the horizontal surface. We are given the initial n vertices of the polygonal cloth in 
counterclockwise order. 

We make the following standard assumptions on the cloth material: 

1. The cloth has infinite flexibility. There is no energy contribution from bending. 

2. The cloth is non-stretchable. No geodesic path lengths can be increased. 

3. The cloth has infinite friction with the surface on which it lies and with itself. 

4. The cloth has zero thickness. 

5. l!Yiec\oih is subject to gravity. 

6. The cloth has no dynamics. 

At the core of our approach is the following additional assumption, which we call 
the downward tendency assumption: 

7. If the cloth is held by a number of grippers, and one or more grippers release the 
cloth, no point of the cloth will move upwards as a result of gravity and internal 
forces within the cloth. 

This assumption does not directly follow from physics, rather it is an approxima- 
tion which seems to match the behavior of reasonably shaped cloth, such as every- 
day clothing articles, surprisingly welU, and it allows us to reason purely about the 
geometry rather than the physics of the cloth. 

' The assumption is not accurate for an exotic family of shapes called pinwheels, as shown 
in ID. 
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Fig. 2 Examples of vertical parts of cloths in various configurations. In order for the cloth not 
to be immobilized, all convex vertices not at the baseline at which the negative gravity vector 
(small arrows) does not point into the cloth must be grasped. These vertices are indicated by 
the dots. 



The downward-tendency assumption allows the cloth to be held by the grippers 
such that one section lies horizontally on the surface and another section hangs 
vertically. The line that separates the horizontal and the vertical parts is called the 
baseline. To ensure deterministic behavior of the cloth, the grippers must be ar- 
ranged so that the vertical section is immobilized. The points that are lying on the 
surface (including those on the baseline) are immobilized, as they cannot move in 
the plane due to friction and will not move upward per the downward-tendency as- 
sumption, so they need not be grasped. Fig. |2]shows an example, where points of the 
cloth are held by grippers. To make sure that the vertical part of the cloth is immo- 
bilized, it turns out that every convex vertex of the vertical part of the cloth at which 
the negative gravity vector does not point into the cloth polygon must either be held 
by a gripper or be part of the baseline. This follows from the following theorem: 

Theorem 1. In our material model, a vertically hanging cloth polygon is immobi- 
lized when every convex vertex of the cloth at which the negative gravity vector does 
not point into the cloth polygon is fixed (i.e. be held by a gripper or be part of the 
baseline). 

Proof: By j3), we know that a non-stretchable planar tree is fully immobilized if 
each node of the tree of which its incident edges do not positively span R^ is fixed. 
Now, let us define an upper string of a polygon as a maximal sequence of edges of 
which the extreme vertices are convex vertices of the polygon, and no part of the 
polygon lies above the edges (see Fig.|3la)). A given polygon P can have multiple 
upper strings, but has at least one. 

For each upper string holds that at its convex vertices the negative gravity vector 
points outside the polygon. As these convex vertices are fixed (by a gripper), the 
entire set of edges the string consists of is immobilized. This can be seen by adding 
virtual vertical edges fixed in gravity pointing downward from the non-convex ver- 
tices, which make sure that the non-convex vertices cannot move upward (per the 
downward-tendency assumption). The incident edges of the non-convex vertices 
now positively-span M^, hence the entire string is immobilized. 
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Fig. 3 (a) A polygon with two upper strings shown thick, (b) The white part of the polygon 
(including the vertical dashed edges) has proven immobilized. The grey part remains. 

Now, every point of the polygon P that can be connected to an upper string by a 
vertical line segment that is fully contained within P is immobilized. This is because 
this point cannot move downward per the non-stretchability assumption (note that 
the upper string is immobilized), and it cannot move upward per the downward- 
tendency assumption. Hence, all such points can be "removed" from P - they 
have been proven immobilized. What remains is a smaller polygon P' (potentially 
consisting of multiple pieces) for which immobilization has not been proven (see 
FigEtb)). The smaller polygon P' has vertical edges that did not belong to the orig- 
inal polygon P. The points on these vertical edges are immobilized, including both 
incident vertices (of which the upper one may be a non-convex vertex of P that is 
convex in P'), as they vertically connect to the upper string. 

Then, the proof recurses on the new polygon P' , of which the convex vertices of 
the upper string(s) need to be fixed. Note that P' may have convex vertices that were 
non-convex in P. These need not be fixed, as they were already proven immobilized 
since they are part of the vertical edge of P' . 

This proves the theorem. Note that convex vertices where the negative gravity 
vector points into the polygon will never be part of an upper string at any phase of 
the proof, so they need not be fixed. Also, the recursion "terminates." This can be 
seen by considering the vertical trapezoidal decomposition of the original polygon 
P, which contains a finite number of trapezoids. In each recursion step, at least one 
trapezoid is removed from P, until the entire polygon has proven immobilized. D 



A g-fold (g refers to gravity) is specified by a directed line segment in the plane 
whose endpoints lie on the boundary of the cloth polygon. The segment partitions 
the polygon into two parts, one to be folded over another (see Fig.|4ll. A g-fold is 
successfully achieved when the part of the polygon to the left of the directed line 
segment is folded across the line segment and placed horizontally on top of the other 
part, while maintaining the following property: 

• At all times during a folding procedure, every part of the cloth is either hori- 
zontal or vertical, and the grippers hold points on the vertical part such that it is 
immobilized (see Fig.|5ll. 

This ensures that the cloth is in a fully predictable configuration according to our 
material model at all times during the folding procedure. Not all folds can be 
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achieved using a g-fold; in terms of III, valley folds can be achieved using a g-fold, 
but mountain folds cannot. 

A g-fold sequence is a sequence of g-folds as illustrated in Fig.[T] After the initial 
g-fold, the stacked geometry of cloth allows us to specify two types of g-fold: a "red" 
g-fold and a "blue" g-fold. A blue g-fold is specified by a line segment partitioning 
the polygon formed by the silhouette of the stacked geometry into two parts, and is 
successfully achieved by folding the (entire) geometry left of the line segment. A red 
g-fold is similarly specified, but only applies to the (potentially stacked) geometry 
that was folded in the previous g-fold (see Fig. [Hi. 

We are given a robot with k point grippers that can grasp the cloth at any point on 
the boundary of the polygon formed by the silhouette of the stacked geometry. At 
each such point, the gripper will grasp all layers of the stack at that point (i.e., it is 
not capable of distinguishing between layers). Each of the grippers is able to move 
independently above the x}'-plane and we assume that gripper motion is exact. 

The problem we discuss in this paper is then defined as follows. Given a speci- 
fication of a sequence of g-folds, determine whether each of the folds are feasible 
given the number of grippers available, and if so, compute the number of grippers 
needed and the manipulation motion for each of the grippers to achieve the g-folds. 



4 Planning G-Folds 

4.1 Single G- Folds on Unstacked Geometry 

We first discuss the case of performing a single g-fold of the original (unstacked) 
polygon. During the manipulation, the cloth must be separated in a vertical part and 
a horizontal part at all times. The line separating the vertical part and the horizontal 
part is called the baseline. 

Given a polygonal cloth and a specification of a g-fold by a directed line seg- 
ment (e.g. the first g-fold of Fig. |Ha)), we plan the manipulation as follows. The 
manipulation consists of two phases: in the first phase, the cloth is manipulated 
such that the part that needs to be folded is brought vertical above the line segment 




Fig. 4 (a) A g-fold is specified by a directed line segment partitioning the (stacked) geometry 
into two parts. The g-fold is successfully achieved when the part of the geometry left of the 
line segment is folded around the line segment. A sequence of two g-folds is shown here, (b) 
A g-fold sequence similar to (a), but the second g-fold (a red g-fold) is specified such that it 
only applies to the part of the cloth that was folded in the previous g-fold. 
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(1) 



(2) 



(3) 



(4) 



(5) 



Fig. 5 The motion of two grippers (arrows) successfully performing the first g-fold specified 
in Fig.|4la) shown both in 3-D view and top-view. At all times during the manipulation, all 
parts of the cloth are either vertical or horizontal and immobilized. The boundary between 
the vertical part and the horizontal part of the cloth is called the baseline. 



specifying the g-fold (see Figs. [5]-(l), |5}(2), and|5]-(3)). In the second phase, the 
g-fold is completed by manipulating the cloth such that the vertical part is laid down 
on the surface with its original normal reversed (Figs.|5l-(3),[5]-(4), and|5l-(5)). 

The first phase is carried out as shown in Figs.|5]-(l),|5]-(2) and[5]-(3), manipulat- 
ing the cloth such that the baseline of the vertical part is parallel to the line segment 
at all times. Initially, the "baseline" is outside the cloth polygon (meaning that there 
is no vertical part) and is moved linearly towards the line segment specifying the 
g-fold. In the second phase, the g-fold is completed by laying down the vertical part 
of the cloth using a mirrored manipulation in which the baseline is again parallel to 
the line segment at all times. Initially the baseline is at the line segment specifying 
the g-fold and in moved linearly outward until the baseline is outside the folded part 
of the polygon (see Figs.H(3),|5}(4) and|3(5)). 

The corresponding motions of the grippers holding the vertices can be computed 
as follows. Let us assume without loss of generality that the line segment specifying 
the g-fold coincides with the x-axis and points in the positive x-direction. Hence, 
the part of the polygon above the x-axis needs to be folded. Each convex vertex of 
this part in which the positive v-vector points outside of the cloth in its initial con- 
figuration needs to be held by a gripper at some point during the manipulation. We 
denote this set of vertices by V and can be computed in 0{n) time. Let y* be the 
maximum of the }'-coordinates of the vertices in V. Now, we let the baseline, which 
is parallel to the x-axis at all times, move "down" with speed 1, starting at yi, = y*, 
where yi, denotes the y-coordinate of the baseline. Let the initial planar coordinates 
of a vertex v G V be (xyj^v). As soon the baseline passes y^, vertex v starts to be ma- 
nipulated. When the baseline passes — y,,, vertex v stops to be manipulated. During 
the manipulation, the vertex is held precisely above the baseline. In general, the 3-D 
coordinate {x{yb),y{yb),z{yb)) of the gripper holding vertex v as a function of the 
j-coordinate of the baseline is: 
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xiyb) 



yiyb) =yb, 



z{yb) =yv-\yb\, 



(1) 



for yi, G [jv, —yx] . Outside of this interval, the vertex is part of the horizontal part of 
the cloth and does not need to be grasped by a gripper. This reasoning applies to all 
vertices v G y. When the baseline has reached —y*, all vertices have been laid down 
and the g-fold is completed. As a result, we do not need to grasp any vertex outside 
of V at any point during the manipulation, and the motions of the vertices in V can 
be computed in 0{n) time. 

4.2 Sequences of G-Folds and Stacked Geometry 

We now discuss the case of folding already folded geometry. First, we discuss how 
to represent folded, stacked geometry. Let us look at the example of the longsleeve 
t-shirt of Fig.[TJ and in particular at the geometry of the cloth after five g-folds. The 
creases of the folds have subdivided the original polygon into facets (see Fig.[6|a)). 
With each such facet, we maintain two values: an integer indicating the height of the 
facet in the stacked geometry (1 is the lowest) and a transformation matrix indicat- 
ing how the facet is transformed from the original geometry to the folded geometry. 
Each transformation matrix is a product of a subset of the matrices Ff that each cor- 
respond to the mirroring in the line segment specifying the /'th g-fold. In Fig.jSJb), 
we show the lines of each of the g-folds with the associated matrix Fi. 

Given the representation of the current stacked geometry and a line segment spec- 
ifying a new g-fold, we show how we manipulate the cloth to successfully perform 
the g-fold or report that the g-fold is infeasible. We assume that the line segment 
specifying the g-fold partitions the silhouette of the stacked geometry into two parts 
(i.e., a blue g-fold). Let us look at the sixth specified g-fold in the longsleeve t-shirt 
example, which folds the geometry of Fig.|6l 

Each facet of the geometry (in its folded configuration) is either fully to the left 
of the line segment, fully to the right, or intersected by the line segment specifying 
the g-fold. The facets intersected by the line segment are subdivided into two new 



Fi\ ; Fi Fi Fi /Ft 




Fig. 6 (a) The representation of folded stacked geometry. The example shown here is the 
longsleeve t-shirt of Fig. [T] after five g-folds. With each facet, the stack height (integer) and a 
transformation matrix is stored, (b) Each transformation matrix Fi corresponds to mirroring 
the geometry in the line segment specifying the f th g-fold. 
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Fig. 7 (a) The geometry in the longsleeve t-shirt example after subdividing the facets by 
the line segment specifying the sixth g-fold. The gray facets need to be folded. The convex 
vertices for which the negative gravity vector points outside of the facet are shown using dots, 
(b) The vertex marked by the dot need not be held by a gripper to perform the fold, as it is 
non-convex in the dark gray facet (even though it is convex in the light gray facet). 



facets, both initially borrowing the data (the stack height and the transformation 
matrix) of the original facet. Now, each facet will either be folded, or will not be 
folded. Fig. |7{a) shows the new geometry in the longsleeve t-shirt example after 
subdividing the facets by the line segment specifying the g-fold. The gray facets 
need to be folded. 

As in the case of folding planar geometry, for each facet each convex vertex at 
which the gravity vector points outside of the facet at the time it is above the line 
segment specifying the g-fold should be held by a gripper, and each non-convex 
vertex and each convex vertex at which the negative gravity vector points inside the 
facet need not be held by a gripper. If multiple facets share a vertex, and according 
to at least one facet it needs not be held by a gripper, it does not need to be held by 
a gripper (see Fig.lTJb)). 

For the t-shirt example, the vertices that need to be grasped are shown using dots 
in Fig. 13 a) and labeled vi , . . . , vy. Applying the transformation matrices stored with 
the incident facet to each of the vertices shows that vi, V3, V5, and V7 will coincide 
in the plane. As a gripper will grasp all layers the geometry, only one gripper is 
necessary to hold these vertices. Vertex V4 also needs to be held by gripper. Vertices 
V2 and vg remain, but they need not be grasped. This is for the following reason. As 
can be seen in Fig.[Tl these vertices are fully covered. That is, the vertex is "hidden" 
behind other facets of the cloth both below and above it in the stacked geometry. As 
we assume that the friction between two pieces of the cloth is infinite, this vertex 
will not be able to move as a result of gravity, and need not be grasped. Using the 
heights stored at each facet, we can compute for each vertex whether it is covered 
or not. 

This defines fully what vertices need to be grasped to achieve a g-fold of stacked 
geometry. If any such vertex is not on the boundary of the silhouette of the stacked 
geometry, the g-fold is infeasible (for example, the second g-fold of Fig. |4](a) is 
infeasible for this reason). The 3-D motion of the grippers can be computed in the 
same way as for planar geometry, as discussed in Section 14.11 The running time 
for computing the vertices that need to be grasped is in principle exponential in the 
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number of g-folds that preceeded, as in the worst case ; g-folds create 2' facets. If we 
consider the number of g-folds a constant, the set of vertices that need to be grasped 
can be identified in 0{n) time. 

After the g-fold is executed, we need to update the data fields of the facets that 
were folded in the geometry: each of their associated transformation matrices is 
pre-multiplied by the matrix Fi corresponding to a mirroring in the line segment 
specifying the g-fold (F(, in Fig. ^b) for the t-shirt example). The stack height of 
these facets is updated as follows: the order of the heights of all facets that are folded 
are reversed and put on top of the stack. In the example of Fig. EJa), the facets that 
are folded have heights 4, 6, 1, and 3 before the g-fold, and heights 8, 7, 10, and 9 
after the g-fold, respectively. 

The above procedure can be executed in series for a sequence of g-folds. Initially, 
the geometry has one facet (the original polygon) with height 1 and transformation 
matrix / (the identity matrix). If a g-fold is specified to only apply to the folded part 
of the geometry of the last g-fold (a "red" g-fold), the procedure is the same, but 
only applies to those facets that were folded in the last g-fold. We allow these kinds 
of g-folds as a special primitive if they need the same set of vertices to be grasped 
as the previous g-fold. Even if the vertices that are grasped are not on the boundary 
of the silhouette of the geometry, the g-fold can be achieved by not releasing the 
vertices after the previous g-fold. This enriches the set of feasible fold primitives. 

5 Experiments 

5.1 Experimental Setup 

We used a Willow Garage PR2 robotic platform flS], shown in Fig. [H The PR2 
has two articulated 7-axis arms with parallel jaw grippers. We used a soft working 




Fig. 8 The PR2 robotic platform (developed by Willow Garage) performing a g-fold on a 
towel. 
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Fig. 9 Three of each clothing category were used in conducting our experiments. 



'T T^ 




Fig. 10 An example sequence of user-specified folds. The user first clicks on the left arm-pit, 
then on the left shoulder to specify the first fold. The GUI then verifies that this is a valid 
g-fold for the chosen number of grippers. In this case it is, and it then shows the result after 
executing the g-fold (3rd image in the top row). Then the user specifies the next fold by 
two clicks, the program verifies whether it's a valid g-fold, and then shows the result after 
executing the g-fold. 



surface, so the relatively thick grippers can easily get underneath the cloth. Our 
approach completely specifies end-effector position trajectories. It also specifies the 
orientation of the parallel jaw grippers' planes. We used a combination of native IK 
tools and a simple linear controller to plan the joint trajectories. 

We experimented with the clothing articles shown in Fig.|9l Whenever presented 
with a new, spread-out clothing article, a human user clicks on the vertices of the 
article in an image. The user is then presented with a GUI that allows them to specify 
a sequence of folds achievable with the two grippers. Once a valid g-fold has been 
specified, the GUI executes the fold and the user can enter the next fold. Fig. [T0| 
illustrates the fold sequence specification process through an example. 
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Fig. 11 An example folding primitive, automatically executed on a T-shirt polygon. Note the 
clean fold, despite the imperfect symmetry of the original polygon. 



As many sophisticated folds are difficult, if not impossible, to perfectly define 
by hand, the GUI is also seeded with a set of folding primitives. When presented 
with a particular article of clothing, the user is given the option of calling one of 
these primitives. Once called, a sequence of folds is computed, parametrized on a 
number of features such as scaling, rotation, and side lengths. Fig. [TT| shows an 
example primitive being executed on a user-defined polygon in the shape of a shirt. 
To ensure consistency across multiple trials, such primitives were used to execute 
the folds detailed in the Experimental Results section below. 

While our approach assumes the cloth has zero resistance against bending, real 
cloth does indeed resist against bending. As a consequence, our approach outlined 
so far overestimates the number of grippers required to hold a piece of cloth in a pre- 
dictable, spread-out configuration. Similarly, our robot grippers have non-zero size, 
also resulting in an overestimation of the number of grippers required. To account 
for both of these factors, our implementation offers the option to allocate a radius to 
each of our grippers, and we consider a point being gripped whenever it falls inside 
this radius. To compute the grip points, we first compute the grip points required for 
point grippers and infinitely flexible cloth. We then cluster these points using a sim- 
ple greedy approach. We begin by attempting to position a circle of fixed radius in 
the coordinate frame such that it covers the maximum number of grip points, while 
subsequently minimizing the average distance from each covered point to its center. 
This process is iterated until no point remains uncovered. For the duration of the 
fold, our grippers now follow the trajectory of the center of each cluster, rather than 
individual grip points. 
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Fig. 12 The user-requested sequences of folds used in our experiments. 

5.2 Experimental Results 

We tested our approach on four categories: towels, pants, short-sleeved shirts, and 
sweaters. Fig. [T2| shows the fold sequences used for each category. To verify ro- 
bustness of our approach, we tested on three instances of each category of clothing. 
These instances varied in size, proportion, thickness, and texture. At the beginning 
of each experimental trial, we provided the PR2 with the silhouette of the polygon 
through clicking on the vertices in two stereo images. 

Fig. [13] shows the robot going through a sequence of folds. Table [T] shows suc- 
cess rates and timing on all clothing articles. As illustrated by these success rates, 
our method demonstrates a consistent level of reliability on real cloth, even when the 
manipulated fabric notably strays from the assumptions of our model. For instance, 
the g-fold method worked reasonably well on pants, despite the material's clear 
violation of the assumption of non-zero thickness, and a three-dimensional shape 
which was not quite polygonal. It was also able to fold a collared shirt with perfect 



Table 1 Experimental results of autonomous cloth folding. 



Category 


Success rate 


Avg time (s) 


Category 


Success rate 


Avg time (s) 


Towels 


9/9 


200.0 


Short-Sleeved Shirts 


7/9 


337.6 


Purple 


3/3 


215.6 


Pink T-Shirt 


2/3 


332.8 


Leopard 


3/3 


210.9 


Blue T-Shirt 


2/3 


343.2 


Yellow 


3/3 


173.5 


White Collared 


3/3 


337.6 


Pants 


7/9 


186.6 


Long-Sleeved Tops 


5/9 


439.0 


Long Khaki 


3/3 


184.9 


Long-Sleeved Shirt 


2/3 


400.7 


Brown 


1/3 


185.9 


Gray Sweater 


1/3 


458.4 


Short Khaki 


3/3 


189.1 


Blue Sweater 


2/3 


457.8 
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Fig. 13 The robot folding a t-shirt using our approach. 



accuracy, despite a rigid collar and buttons, neither of which are expressible in the 
language of our model. This level of practical success is indicative of a certain ro- 
bustness to our approach, which lends itself to a number of implications. The first is 
that despite the simplifications inherent to our model, real cloth behaves determin- 
istically under roughly the same conditions as its ideal counterpart. While human 
manipulation of cloth exploits a number of features which our model neglects, these 
features generally arise in states which our model considers unreachable. That is, 
handling true fabric often requires less caution than our g-fold model predicts, but 
rarely does it require more. The second is that even when unpredicted effects do 
arise, the final result is not compromised. Although factors such as thickness may 
cause the cloth to deviate slightly from its predicted trajectory - most often in the 
form of "clumping" for thick fabrics - the resulting fold generally agrees with the 
model, particularly after smoothing. 

Much of our success can be attributed to a number of assumptions which real 
cloth very closely met: namely, the infinite friction between the cloth and the table, 
and the infinite friction between the cloth and itself. The former allowed us to exe- 
cute g-folds even when the modeled polygon did not perfectly match the silhouette 
of the cloth. As actual articles of clothing are not comprised solely of well-defined 
corners, this imprecision often resulted in a nonzero horizontal tension in the cloth 
during the folding procedure. However, as the friction between the cloth and the 
table far outweighs this tension, the cloth remained static. The latter allowed us to 
stabilize loose vertices by "sandwiching" them between two gripped portions of 
cloth. This technique, in combination with the robust gripping approach detailed 
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above, allowed us to execute a number of folds (such as the shirt folds in Fig.[T2ll 
which more closely resembled their standard human counterpart. With the exception 
of long-sleeved shirts, all sequences could be perfectly executed by a pair of point 
grippers. However, some relied on the ability to create perfect 90 degree angles, or 
perfectly align two edges which (in actuality) were not entirely straight. Exact pre- 
cision was impossible in both of these cases; but where there was danger of gravity 
influencing a slightly unsupported vertex, the friction of the cloth, in conjunction 
with its stiffness, often kept it in a stable configuration. 

The trials were not, however, without error. Most often, failure was due to the 
limitations of our physical control, rather than a flaw in our model. For instance, 
2/2 Short-Sleeved failures and 3/4 Long-Sleeved failure occurred at steps where 
the robot was required to grasp a portion of previously folded sleeve (Short-Sleeve 
Steps 2 and 4, Long-Sleeve Steps 3 and 6 in Fig.fT2]| In all cases, the failure could be 
easily predicted from the location of the initial grasp. Either the robot did not reach 
far enough and grasped nothing, or reached too far and heavily bunched the cloth. 
These failures suggest a clear issue with our implementation: namely, the reliance 
on open-loop control. While the initial position of each vertex is given, the location 
of a folded vertex must be derived geometrically. For this location to be correct, 
we must make two assumptions: that the cloth at hand is perfectly represented by 
the given polygon, and that the trajectory, once computed, can be exactly followed. 
Clearly, both are idealizations: the former disregards the multi-layered nature of all 
but towels (which saw a 100% success rate) and the latter is hindered by the in- 
herent imprecision of any robotic mechanism. A closed-loop method, which would 
allow the robot to adjust the shape of the modeled polygon to allow for real-world 
discrepancies, would likely eliminate these issues. 

In addition, the robot moved very slowly in our trials, leading to large running 
times. The reason for moving slowly, besides the fact that it helps the cloth behave in 
a more deterministic fashion (no dynamics effects), is that it led to higher accuracy 
of the motions of the arms and base of the robot, which was required for open-loop 
control. Also here, a closed-loop method would likely enable performance improve- 
ments. 

Videos of our experimental results are available at http://rll.berkeley.edu/wafrlO- 
gfolds/. 

6 Conclusion and Future Work 

We described a geometric approach to cloth folding — avoiding the difficulties with 
physical simulation of cloth. To do so, we restrict attention to a limited subset of 
cloth configuration space. Our experiments show that (i) this suffices to capture in- 
teresting folds, and (ii) real cloth behaves benignly, even when moderately violating 
our assumptions. Our approach enabled reliable folding of a wide variety of clothing 
articles. 

In the experiments we performed, a human user had to click on the vertices of 
the clothing article, and would then select a fold sequence. We plan to develop 
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computer vision algorithms that enable automatic recognition of clothing article 
categories, as well as specific geometric instances thereof. The robot could then look 
up the desired folding sequence for the presented article. We are also working on 
simple primitives that will enable taking clothing articles from crumpled, unknown 
configurations to the spread-out configuration. 

Careful inspection of the gripper paths shows that a single very large parallel 
jaw gripper would suffice to execute a g-fold requiring an arbitrary number of point 
grippers. We plan to investigate a practical implementation of this idea for the PR2. 
A large such gripper would reduce the collision free workspace volume significantly. 
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